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POLYNUCLEOTIDE ENCODING A POLYPEPTIDE HAVING 
HEPARANASE ACTIVITY AND EXPRESSION OF SAME IN 
GENETICALLY MODIFIED CELLS 

5 FIELD AN D BACKGROUND OF THE INVENTION 

The present invention relates to a polynucleotide, referred to 
hereinbelow as hpa, encoding a polypeptide having heparanase activity, 
vectors (nucleic acid constructs) including same and genetically modified 
cells expressing heparanase. The invention further relates to a recombinant 
10 protein having heparanase activity and to antisense oligonucleotides, 
constructs and ribozymes for down regulating heparanase activity. In 
addition, the invention relates to heparanase promoter sequences and their 
uses. 

Heparan sulfate proteoglycans: Heparan sulfate proteoglycans 
(HSPG) are ubiquitous macromolecules associated with the cell surface and 
extra cellular matrix (ECM) of a wide range of cells of vertebrate and 
invertebrate tissues (1-4). The basic HSPG structure includes a protein core 
to which several linear heparan sulfate chains are covalently attached. 
These polysaccharide chains are typically composed of repeating hexuronic 
and D-glucosamine disaccharide units that are substituted to a varying 
extent with N- and O-linked sulfate moieties and N-linked acetyl groups (1- 
4). Studies on the involvement of ECM molecules in cell attachment, 
growth and differentiation revealed a central role of HSPG in embryonic 
morphogenesis, angiogenesis, neurite outgrowth and tissue repair (1-5). 
HSPG are prominent components of blood vessels (3). In large blood 
vessels they are concentrated mostly in the intima and inner media, whereas 
in capillaries they are found mainly in the subendothelial basement 
membrane where they support proliferating and migrating endothelial cells 
and stabilize the structure of the capillary wall. The abihty of HSPG to 
interact with ECM macromolecules such as collagen, laminin and 
fibronectin, and with different attachment sites on plasma membranes 
suggests a key role for this proteoglycan in the self-assembly and 
insolubility of ECM components, as well as in cell adhesion and 
locomotion. Cleavage of the heparan sulfate (HS) chains may therefore 
result in degradation of the subendothelial ECM and hence may play a 
decisive role in extravasation of blood-borne cells, HS catabolism is 
observed in inflammation, wound repair, diabetes, and cancer metastasis, 
suggesting that enzymes which degrade HS play important roles in 
pathologic processes. Heparanase activity has been described in activated 
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immune system cells and highly metastatic cancer cells (6-8), but research 
has been handicapped by the lack of biologic tools to explore potential 
causative roles of heparanase in disease conditions. 

Involvement of Heparanase in Tumor Cell Invasion and 
Metastasis: Circulating tumor cells arrested in the capillary beds of 
different organs must invade the endothelial cell lining and degrade its 
underlying basement membrane (BM) in order to invade into the 
extravascular tissue(s) where they establish metastasis (9, 10). Metastatic 
tumor cells often attach at or near the intercellular junctions between 
adjacent endothelial cells. Such attachment of the metastatic cells is 
followed by rupture of the junctions, retraction of the endothelial cell 
borders and migration through the breach in the endothelium toward the 
exposed underlying BM (9). Once located between endothelial cells and the 
BM, the invading cells must degrade the subendothelial glycoproteins and 
proteoglycans of the BM in order to migrate out of the vascular 
compartment. Several cellular enzymes (e.g., collagenase IV, plasminogen 
activator, cathepsin B, elastase, etc.) are thought to be involved in 
degradation of BM (10). Among these enzymes is an endo-p-D- 
glucuronidase (heparanase) that cleaves HS at specific intrachain sites (6, 
8, II). Expression of a HS degrading heparanase was found to correlate 
with the metastatic potential of mouse lymphoma (II), fibrosarcoma and 
melanoma (8) cells. Moreover, elevated levels of heparanase were detected 
m sera fi-om metastatic tumor bearing animals and melanoma patients (8) 
and in tumor biopsies of cancer patients (12). 

The control of cell proliferation and tumor progression by the local 
microenvironment, focusing on the interaction of cells with the 
extracellular matrix (ECM) produced by cultured corneal and vascular 
endothelial cells, was investigated previously by the present inventors. This 
cultured ECM closely resembles the subendothelium in vivo in its 
morphological appearance and molecular composition. It contains 
collagens (mostly type III and IV, with smaller amounts of types I and V), 
proteoglycans (mostly heparan sulfate- and dermatan sulfate- proteoglycans,' 
with smaller amounts of chondroitin sulfate proteoglycans), laminin' 
fibronectin, entactin and elastin (13, 14). The ability of cells to degrade HS 
in the cultured ECM was studied by allowing cells to interact with a 
metabolically sulfate labeled ECM, followed by gel filtration (Sepharose 
6B) analysis of degradation products released into the culture medium (11). 
While intact HSPG are eluted next to the void volume of the column 
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(Kav<0.2, Mr ~ 0.5xlo6), labeled degradation fragments of HS side chains 
are eluted more toward the Vt of the column (0.5<kav<0.8, Mr =5-7x103) 
(11). 

The heparanase inhibitory effect of various non-anticoagulant 
species of heparin that might be of potential use in preventing extravasation 
of blood-borne cells was also investigated by the present inventors. 
Inhibition of heparanase was best achieved by heparin species containing 16 
sugar units or more and having sulfate groups at both the N and O positions. 
While O-desulfation abolished the heparanase inhibiting effect of heparin, 
0-suIfated, N-acetylated heparin retained a high inhibitory activity,' 
provided that the N-substituted molecules had a molecular size of about 
4,000 daltons or more (7). Treatment of experimental animals with 
heparanase inhibitors (e.g., non-anticoagulant species of heparin) markedly 
reduced (>90%) the incidence of lung metastases induced by B16 
melanoma, Lewis lung carcinoma and mammary adenocarcinoma cells (7, 
8, 16). Heparin fractions with high and low affinity to anti-thrombin III 
exhibited a comparable high anti-metastatic activity, indicating that the 
heparanase inhibiting activity of heparin, rather than its anticoagulant 
activity, plays a role in the anti-metastatic properties of the polysaccharide 

Heparanase activity in the urine of cancer patients: In an attempt 
to further elucidate the involvement of heparanase in tumor progression 
and its relevance to human cancer, urine samples for heparanase activity 
were screened (16a). Heparanase activity was detected in the urine of some, 
but not all, cancer patients. High levels of heparanase activity were 
determined in the urine of patients with an aggressive metastatic disease and 
there was no detectable activity in the urine of healthy donors. 

Heparanase activity was also found in the urine of 20% of nonnal 
and microalbuminuric insulin dependent diabetes mellims (IDDM) patients, 
most likely due to diabetic nephropathy, the most important single disorder 
leadmg to renal failure in adults. 

Possible involvement of heparanase in tumor angiogenesis: 
Fibroblast growth factors are a family of stnicturally related polypeptides 
characterized by high affinity to heparin (17). They are highly mitogenic 
for vascular endothelial cells and are among the most potent inducers of 
neovascularization (17, 18). Basic fibroblast growth factor (bFGF) has been 
extracted from the subendothelial ECM produced in vitro (19) and from 
basement membranes of the cornea (20), suggesting that ECM may serve as 
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a reservoir for bFGF. Immunohistochemical staining revealed the 
localization of bFGF in basement membranes of diverse tissues and blood 
vessels (21). Despite the ubiquitous presence of bFGF in normal tissues 
endothelial cell proliferation in these tissues is usually veiy low, suggesting 
5 that bFGF is somehow sequestered from its site of action. Studies on the 
interaction of bFGF with ECM revealed that bFGF binds to HSPG in the 
ECM and can be released in an active form by HS degrading enzymes (15, 
20, 22). It was demonstrated that heparanase activity expressed by platelets' 
mast cells, neutrophils, and lymphoma cells is involved in release of active 
' bFGF from ECM and basement membranes (23), suggesting that 
heparanase activity may not only function in cell migration and invasion 
but may also elicit an indirect neovascular response. These results suggest 
that the ECM HSPG provides a natural storage depot for bFGF and possibly 
other heparin-binding growth promoting factors (24, 25). Displacement of 
bFGF from its storage within basement membranes and ECM may therefore 
provide a novel mechanism for induction of neovascularization in normal 
and pathological situations. 

Recent studies indicate that heparin and HS are involved in binding 
of bFGF to high affinity cell surface receptore and in bFGF cell signaling 
(26, 27). Moreover, the size of HS required for optimal effect was similar 
to that of HS fragments released by heparanase (28). Similar results were 
obtained with vascular endothelial cells growth factor (VEGF) (29) 
suggesting the operation of a dual receptor mechanism involving HS in cell' 
mteraction with heparin-binding growth factors. It is therefore proposed 
that restnction of endothelial cell growth factors in ECM prevents their 
systemic action on the vascular endothelium, thus maintaining a very low 
rate of endothelial cells turnover and vessel growth. On the other hand 
release of bFGF from storage in ECM as a complex with HS fragment may 
ehct localized endothelial cell proliferation and neovascularizati'on in 
processes such as wound healing, inflammation and tumor development 
(24, 25). 

Expression of heparanase by cells of the immune system: 
Heparanase activity correlates with the ability of activated cells of the 
immune system to leave the circulation and elicit both inflammatory and 
autoimmune responses. Interaction of platelets, granulocytes, T and B 
lymphocytes, macrophages and mast cells with the subendothelial ECM is 
associated with degradation of HS by a specific heparanase activity (6) 
The enzyme is released from intracellular compartments (e.g.. lysosomes 
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specific granules, etc.) in response to various activation signals (eg 
thrombin, calcium ionophore. immune complexes, antigens, mitogens etc )' 
suggesting Its regulated involvement in inflammation and cellular 
immunity. 

Some of the observations regarding the heparanase enzyme were 
reviewed in reference No. 6 and are listed hereinbelow: 

First, a proteolytic activity (plasminogen activator) and heparanase 
participate synergistically in sequential degradation of the ECM HSPG by 
mflammatory leukocytes and malignant cells. 

Second, a large proportion of the platelet heparanase exists in a latent 
form, probably as a complex with chondroitin sulfate. The latent enzyme is 
activated by tumor cell-derived factor(s) and may then facilitate cell 
invasion through the vascular endothelium in the process of tumor 
metastasis. 

Third, release of the platelet heparanase from a-granules is induced 
by a strong stimulant (i.e., thrombin), but not in response to platelet 
activation on ECM. 

Fourth, the neutrophil heparanase is preferentially and readily 
released in response to a threshold activation and upon incubation of the 
cells on ECM. 

Fifth, contact of neutrophils with ECM inhibited release of noxious 
enzymes (proteases, lysozyme) and oxygen radicals, but not of enzymes 
(heparanase, gelatinase) which may enable diapedesis. This protective role 
of the subendothelial ECM was observed when the cells were stimulated 
with soluble factors but not with phagocytosable stimulants. 

Sixth, intracellular heparanase is secreted within minutes after 
exposure of T cell lines to specific antigens. 

Seventh, mitogens (Con A, LPS) induce synthesis and secretion of 
heparanase by normal T and B lymphocytes maintained in vitro T 
lymphocyte heparanase is also induced by immunization with antigen in 

VIVO. ^ 



Eighth, heparanase activity is expressed by pre-B lymphomas and B- 
lymphomas, but not by plasmacytomas and resting normal B lymphocytes 

Ninth, heparanase activity is expressed by activated macrophages 
during mcubation with ECM, but there was little or no release of the 
enzyme into the incubation medium. Similar results were obtained with 
human myeloid leukemia cells induced to differentiate to mature 
macrophages. 
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Tenth, T-cell mediated delayed type hypersensitivity and 
experimental autoimmunity are suppressed by low doses of heparanase 
inhibiting non-anticoagulant species of heparin (30). 

Eleventh, heparanase activity expressed by platelets, neutrophils and 
5 metastatic tumor cells releases active bFGF from ECM and basement 
membranes. Release of bFGF from storage in ECM may elicit a localized 
neovascular response in processes such as wound healing, inflammation and 
tumor development. 

Twelfth, among the breakdown products of the ECM generated by 
10 heparanase is a tri-sulfated disaccharide that can inhibit T-cell mediated 
inflammation in vivo (31). This inhibition was associated with an inhibitory 
effect of the disaccharide on the production of biologically active TNFa by 
activated T cells in vitro (31). 

Other potential therapeutic applications'. Apart from its 
involvement in tumor cell metastasis, inflammation and autoimmunity, 
mammalian heparanase may be applied to modulate: bioavailability of 
heparin-binding growth factors (15); cellular responses to heparin-binding 
growth factors (e.g., bFGF, VEGF) and cytokines (IL-8) (31a, 29); cell 
interaction with plasma lipoproteins (32); cellular susceptibility to certain 
viral and some bacterial and protozoa infections (33, 33a, 33b); and 
disintegration of amyloid plaques (34). Heparanase may thus prove useful 
for conditions such as wound healing, angiogenesis. restenosis, 
atherosclerosis, inflammation, neurodegenerative diseases and viral 
infections. Mammalian heparanase can be used to neufralize plasma 
heparin, as a potential replacement of protamine. Anti-heparanase 
antibodies may be applied for immunodetection and diagnosis of 
micrometastases, autoimmune lesions and renal failure in biopsy specimens, 
plasma samples, and body fluids. Common use in basic research is 
expected. 

The identification of the hpa gene encoding for heparanase enzyme 
will enable the production of a recombinant enzyme in heterologous 
expression systems. Availability of the recombinant protein will pave the 
way for solving the protein structure function relationship and will provide 
a tool for developing new inhibitors. 

Viral Infection: The presence of heparan sulfate on cell surfaces 
have been shown to be the principal requirement for the binding of Herpes 
Simplex (33) and Dengue (33a) viruses to cells and for subsequent infection 
of the cells. Removal of the cell surface heparan sulfate by heparanase may 
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therefore abolish virus infection. In fact, treatment of cells with bacterial 
heparitinase (degrading heparan sulfate) or heparinase (degrading heparan) 
reduced the binding of two related animal heipes viruses to cells and 
rendered the cells at least partially resistant to virus infection (33). There 
are some indications that the cell surface heparan sulfate is also involved in 
HIV infection (33b). 

Neurodegenerative diseases: Heparan sulfate proteoglycans were 
identified in the prion protein amyloid plaques of Genstmann-Straussler 
Syndrome, Creutzfeldt-Jakob disease and Scrape (34). Heparanase may 
disintegrate these amyloid plaques which are also thought to play a role in 
the pathogenesis of Alzheimer's disease. 

Restenosis and Atherosclerosis: Proliferation of arterial smooth 
muscle cells (SMCs) in response to endothelial injury and accumulation of 
cholesterol rich lipoproteins are basic events in the pathogenesis of 
atherosclerosis and restenosis (35). Apart fi-om its involvement in SMC 
proliferation (i.e., low afBnity receptors for heparin-binding growth factors), 
HS is also involved in lipoprotein binding, retention and uptake (36). It was 
demonstrated that HSPG and lipoprotein lipase participate in a novel 
catabolic pathway that may allow substantial cellular and interstitial 
accumulation of cholesterol rich lipoproteins (32). The latter pathway is 
expected to be highly atherogenic by promoting accumulation of apoB and 
apoE rich lipoproteins (i.e. LDL, VLDL, chylomicrons), independent of 
feed back inhibition by the cellular sterol content. Removal of SMC HS by 
heparanase is therefore expected to inhibit both SMC proliferation and lipid 
accumulation and thus may halt the progression of restenosis and 
atherosclerosis. 

Gene therapy: 

The ultimate goal in the management of inherited as well as acquired 
diseases is a rational therapy with the aim to eliminate the underiying 
biochemical defects associated with the disease rather then symptomatic 
treatment. Gene therapy is a promising candidate to meet these objectives. 
Initially it was developed for treatment of genetic disorders, however, the 
consensus view today is that it offers the prospect of providing therapy for a 
vanety of acquired diseases, including cancer, viral infections, vascular 
diseases and neurodegenerative disorders. 

The gene-based therapeutic can act either intracellularly, affecting 
only the cells to which it is delivered, or extracellularly, using the recipient 
cells as local endogenous factories for the therapeutic product(s). The 
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application of gene therapy may follow any of the following strategies: (i) 
prophylactic gene therapy, such as using gene transfer to protect cells 
against viral infection; (ii) cytotoxic gene therapy, such as cancer therapy, 
where genes encode cytotoxic products to render the target cells vulnerable 
to attack by the normal immune response; (iii) biochemical correction, 
primarily for the treatment of single gene defects, where a normal copy of 
the gene is added to the affected or other cells. 

To allow efficient transfer of the therapeutic genes, a variety of gene 
delivery techniques have been developed based on viral and non-viral 
vector systems. The most widely used and most efficient systems for 
delivering genetic material into target cells are viral vectors. So far, 329 
clinical studies (phase I. I/II and II) with over 2,500 patients have been 
initiated Worldwide since 1989 (50). 

The approach of gene addition pose serious barriers. The expression 
of many genes is tightly regulated and context dependent, so achieving the 
correct balance and function of expression is challenging. The gene itself is 
often quite large, containing many exons and introns. The delivery vector is 
usually a virus, which can infect with a high efficiency but may, on the 
other hand, induce immunological response and consequently decreases 
effectiveness, especially upon secondary administration. Most of the 
current expression vector-based gene therapy protocols fail to achieve 
clinically significant transgene expression required for treating genetic 
diseases. Apparendy, it is difficult to deliver enough virus to the right cell 
type to elicit an effective and therapeutic effect (51) 

Homologous recombination, which was initially considered to be of 
limited use for gene therapy because of its low frequency in mammalian 
cells, has recently emerged as a potential strategy for developing gene 
therapy. Different approaches have been used to study homologous 
recombination in mammalian cells; some involve DNA repair mechanisms. 
These studies aimed at either gene disruption or gene correction and include 
RNA/DNA chimeric oligonucleotides, small or large homologous DNA 
fragments, or adeno-associated viral vectors. Most of these studies show a 
reasonable frequency of homologous recombination, which warrants fiirther 
in vivo testing (52). Homologous recombination-based gene therapy has the 
potential to develop into a powerfiil therapeutic modality for genetic 
diseases. It can offer permanent expression and normal regulation of 
corrected genes in appropriate cells or organs and probably can be used for 
treating dominantly inherited diseases such as polycystic kidney disease. 
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Genomic sequences function in regulation of gene expression: 
The efficient expression of therapeutic genes in target cells or tissues 
is an important component of efficient and safe gene therapy. The 
expression of genes is driven by the promoter region upstream of the coding 
sequence, although regulation of expression may be supplemented by 
farther upstream or downstream DNA sequences or DNA in the introns of 
the gene. Since this important information is embedded in the DNA, the 
description of gene structure is crucial to the analysis of gene regulation. 
Characterization of cell specific or tissue specific promoters, as well as 
other tissue specific regulatory elements enables the use of such sequences 
to direct efficient cell specific, or developmental stage specific gene 
expression. This information provides the basis for targeting individual 
genes and for control of their expression by exogenous agents, such as 
drugs. Identification of transcription factors and other regulatory proteins 
required for proper gene expression will point at new potential targets for 
modulating gene expression, when so desired or required. 

Efficient expression of many mammalian genes depends on the 
presence of at least one intron. The expression of mouse thymidylate 
synthase (TS) gene, for example, is greatly influenced by intron sequences. 
The addition of almost any of the introns from the mouse TS gene to an 
intronless TS minigene leads to a large increase in expression (42). The 
involvement of intron 1 in the regulation of expression was demonstrated 
for many other genes. In human factor IX (hFIX), intron 1 is able to 
increase the expression level about 3 fold mare as compared to that of the 
hFIX cDNA (43). The expression enhancing activity of intron 1 is due to 
efficient functional splicing sequences, present in the precursor mRNA. By 
being efficiently assembled into spliceosome complexes, transcripts with 
splicing sequences may be better protected in the nucleus from random 
degradations, than those without such sequences (44). 

A forward-inserted intron 1 -carrying hFIX expression cassette 
suggested to be useful for directed gene transfer, while for retroviral- 
mediated gene transfer system, reversely-inserted intron 1 -carrying hFIX 
expression cassette was considered (43). 

A highly consei^ed cis-acting sequence element was identified in the 
first intron of the mouse and rat c-Ha-ras, and in the first exon of Ha- and 
Ki-ras genes of human, mouse and rat. This cis-acting regulatory sequence 
confers strong transcription enhancer activity that is differentially 
modulated by steroid hormones in metastatic and nonmetastatic 
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subpopulations. Perturbations in the regulatory activities of such cis-acting 
sequences may play an important role in governing oncogenic potency of 
Ha-ras through transcriptional control mechanisms (45). 

Intron sequences affect tissue specific, as well as inducible gene 
expression. A 182 bp intron 1 DNA segment of the mouse Col2al gene 
contains the necessary information to confer high-level, temporally correct, 
chondrocyte expression on a reporter gene in intact mouse embryos, whilJ 
Col2al promoter sequences are dispensable for chondrocyte expression 
(46). In CollAl gene the intron plays little or no role in constitutive 
expression of collagen in the skin, and in cultured cells derived fi-om the 
skin, however, in the lungs of young mice, intron deletion results in 
decrease of expression to less than 50 % (47). 

A classical enhancer activity was shown in the 2 kb intron fi-agment 
in bovine beta-casein gene. The enhancer activity was largely dependent on 
the lactogenic hormones, especially prolactin. It was suggested that several 
elements in the intron- 1 of the bovine beta-casein gene cooperatively 
interact not only with each other but also with its promoter for hormonal 
induction (48). j 

Identification and characterization of regulatory elements in genomic 
non-coding sequences, such as introns, provides a tool for designing and 
constructing novel vectors for tissue specific, hormone regulated or any 
other defined expression pattern, for gene therapy. Such an expression 
cassette was developed, utilizing regulatory elements from the human 
cytokeratin 18 (K18) gene, including 5' genomic sequences and one of its 
introns. This cassette efficiently expresses reporter genes, as well as the 
human cystic fibrosis transmembrane conductance regulator (CFTR) gene, 
in cultured lung epithelial cells (49). 
Alternative splicing: 

Alternative splicing of pre mRNA is a powerful and versatile 
regulatory mechanism that can effect quantitative control of gene 
expression and fiinctional diversification of proteins. It contributes to major 
developmental decisions and also to a fine-tuning of gene fiinction. Genetic 
and biochemical approaches have identified cis-acting regulatory elements 
and trans-acting factors that control alternative splicing of specific mRNAs. 
35 This mechanism results in the generation of variant isoforms of various 
proteins from a single gene. These include cell surface molecules such as 
CD44, receptors, cytokines such as VEGF and enzymes. Products of 
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alternatively spliced transcripts differ in their expression pattern, substrate 
specificity and other biological parameters. 

The FGF receptor RNA undergoes alternative splicing which results 
in the production of several isoforms, which exhibit different ligand binding 
5 specificities. The alternative splicing is regulated in a cell specific manner 
(53). 

Alternative spliced mRNAs are often correlated with malignancy. 
An increase in specific splice variant of tyrosinase was identified in murine 
melanomas (54). Multiple splicing variants of estrogen receptor are present 
10 in individual human breast tumors. CD44 has various isoform, some are 
characteristic of malignant tissues. 

Identification of tumor specific alternative splice variants provide 
new tool for cancer diagnostics. CD44 variants have been used for 
detection of malignancy in urine samples fi-om patients with urothelial 
15 cancer by competitive RT-PCR (55). CD44 exon 6 was suggested as 
prognostic indicator of metastasis in breast cancer (56). 

Different enzymes or polypeptides generated by alternative splicing 
may have different function or catalytic specificity. The identification and 
characterization of the enzyme forms, which are involved in pathological 
processes, is crucial for the design of appropriate and efficient drugs. 
Modulation of gene expression ~ Antisense technology: 
An antisense oligonucleotide (e.g., antisense 
oligodeoxyribonucleotide) may bind its target nucleic acid either by 
Watson-Crick base pairing or Hoogsteen and anti-Hoogsteen base pairing 
(64). According to the Watson-Crick base pairing, heterocyclic bases of the 
antisense oligonucleotide form hydrogen bonds with the heterocyclic bases 
of target single-stranded nucleic acids (RNA or single-stranded DNA), 
whereas according to the Hoogsteen base pairing, the heterocyclic bases of 
the target nucleic acid are double-stranded DNA, wherein a third strand is 
accommodated in the major groove of the B-form DNA duplex by 
Hoogsteen and anti-Hoogsteen base pairing to form a triple helix structure. 

According to both the Watson-Crick and the Hoogsteen base pairing 
models, antisense oligonucleotides have the potential to regulate gene 
expression and to disrupt the essential functions of the nucleic acids in cells. 
Therefore, antisense oligonucleotides have possible uses in modulating a 
wide range of diseases in which gene expression is altered. 

Since the development of effective methods for chemically 
synthesizing oligonucleotides, these molecules have been extensively used 
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in biochemistry and biological research and have the potential use in 
medicine, since carefully devised oligonucleotides can be used to control 
gene expression by regulating levels of transcription, transcripts and/or 
translation. 

Oligodeoxyribonucleotides as long as 100 base pairs (bp) are 
routinely synthesized by solid phase methods using commercially available, 
fiilly automated synthesis machines. The chemical synthesis of 
oligoribonucleotides, however, is far less routine. Oligoribonucleotides are 
also much less stable than oligodeoxyribonucleotides, a fact which has 
contributed to the more prevalent use of oligodeoxyribonucleotides in 
medical and biological research, directed at, for example, the regulation of 
transcription or translation levels. 

Gene expression involves few distinct and well regulated steps. The 
first major step of gene expression involves transcription of a messenger 
RNA (mRNA) which is an RNA sequence complementary to the antisense 
(i.e., -) DNA strand, or, in other words, identical in sequence to the DNA 
sense (i.e., +) strand, composing the gene. In eukatyotes, transcription 
occurs in the cell nucleus. 

The second major step of gene expression involves translation of a 
20 protein (e.g., enzymes, structural proteins, secreted proteins, gene 
expression factors, etc.) in which the mRNA interacts with ribosomal RNA 
complexes (ribosomes) and amino acid activated transfer RNAs (tRNAs) to 
direct the synthesis of the protein coded for by the mRNA sequence. 

Initiation of transcription requires specific recognition of a promoter 
DNA sequence located upstream to the coding sequence of a gene by an 
RNA-synthesizing enzyme - RNA polymerase. This recognition is 
preceded by sequence-specific binding of one or more transcription factors 
to the promoter sequence. Additional proteins which bind at or close to the 
promoter sequence may trans upregulate transcription via cis elements 
known as enhancer sequences. Other proteins which bind to or close to the 
promoter, but whose binding prohibits the action of RNA polymerase, are 
known as repressors. 

There are also evidence that in some cases gene expression is 
downregulated by endogenous antisense RNA repressors that bind a 
complementary mRNA transcript and thereby prevent its translation into a 
functional protein. 

Thus, gene expression is typically upregulated by transcription 
factors and enhancers and downregulated by repressors. 
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However, in many disease situation gene expression is impaired. In 
many cases, such as different types of cancer, for various reasons the 
expression of a specific endogenous or exogenous (e.g., of a pathogen such 
as a virus) gene is upregulated. Furthermore, in infectious diseases caused 
by pathogens such as parasites, bacteria or viruses, the disease progression 
depends on expression of the pathogen genes, this phenomenon may also be 
considered as far as the patient is concerned as upregulation of exogenous 
genes. 

Most conventional drugs function by interaction with and modulation 
of one or more targeted endogenous or exogenous proteins, e.g., enzymes. 
Such drugs, however, typically are not specific for targeted proteins but 
interact with other proteins as well. Thus, a relatively large dose of drug 
must be used to effectively modulate a targeted protein. 

Typical daily doses of drugs are from 10-5 - IQ-I millimoles per 
kilogram of body weight or 10-3 - 10 millimoles for a 100 kilogram person. 
If this modulation instead could be effected by interaction with and 
inactivation of mRNA, a dramatic reduction in the necessary amount of 
drug could likely be achieved, along with a corresponding reduction in side 
effects. Further reductions could be effected if such interaction could be 
rendered site-specific. Given that a functioning gene continually produces 
mRNA, it would thus be even more advantageous if gene transcription 
could be arrested in its entirety. 

Given these facts, it would be advantageous if gene expression could 
be arrested or downmodulated at the transcription level. 

The ability of chemically synthesizing oligonucleotides and analogs 
thereof having a selected predetermined sequence offers means for 
downmodulating gene expression. Three types of gene expression 
modulation strategies may be considered. 

At the transcription level, antisense or sense oligonucleotides or 
analogs that bind to the genomic DNA by strand displacement or the 
formation of a triple helix, may prevent transcription (64). 

At the transcript level, antisense oligonucleotides or analogs that 
bind target mRNA molecules lead to the enzymatic cleavage of the hybrid 
by intracellular RNase H (65). In this case, by hybridizing to the targeted 
mRNA, the oligonucleotides or oligonucleotide analogs provide a duplex 
hybrid recognized and destroyed by the RNase H enzyme. Alternatively, 
such hybrid formation may lead to interference with correct splicing (66).' 
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As a result, in both cases, the number of the target mRNA intact transcripts 
ready for translation is reduced or eliminated. 

At the translation level, antisense oligonucleotides or analogs that 
bind target mRNA molecules prevent, by steric hindrance, binding of 
essential translation factors (ribosomes), to the target mRNA, a 
phenomenon known in the art as hybridization arrest, disabling the 
translation of such mRNAs (67). 

Thus, antisense sequences, which as described hereinabove may 
arrest the expression of any endogenous and/or exogenous gene depending 
on their specific sequence, attracted much attention by scientists and 
pharmacologists who were devoted at developing the antisense approach 
into a new pharmacological tool (68). 

For example, several antisense oligonucleotides have been shown to 
arrest hematopoietic cell proliferation (69), growth (70), entry into the S 
15 phase of the cell cycle (71), reduced survival (72) and prevent receptor 
mediated responses (73). For use of antisense oligonucleotides as antiviral 
agents the reader is referred to reference 74. 

For efficient in vivo inhibition of gene expression using antisense 
oligonucleotides or analogs, the oligonucleotides or analogs must fulfill the 
20 following requirements (i) sufficient specificity in binding to the target 
sequence; (ii) solubility in water; (iii) stability against intra- and 
extracellular nucleases; (iv) capability of penetration through the cell 
membrane; and (v) when used to treat an organism, low toxicity. 

Unmodified oligonucleotides are impractical for use as antisense 
sequences since they have short in vivo half-lives, during which they are 
degraded rapidly by nucleases. Furthermore, they are difficult to prepare in 
more than milligram quantities. In addition, such oligonucleotides are poor 
cell membrane penetraters (75). 

Thus it is apparent that in order to meet all the above listed 
requirements, oligonucleotide analogs need to be devised in a suitable 
manner. Therefore, an extensive search for modified oligonucleotides has 
been initiated. 

For example, problems arising in connection with double-stranded 
DNA (dsDNA) recognition through triple helix formation have been 
diminished by a clever "switch back" chemical linking, whereby a sequence 
of polypurine on one strand is recognized, and by "switching back", a 
homopurine sequence on the other strand can be recognized. Also, good 
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helix formation has been obtained by using artificial bases, thereby 
improving binding conditions with regard to ionic strength and pH. 

In addition, in order to improve half-life as well as membrane 
penetration, a large number of variations in polynucleotide backbones have 
been done, nevertheless with little success. 

Oligonucleotides can be modified either in the base, the sugar or the 
phosphate moiety. These modifications include, for example, the use of 
methylphosphonates, monothiophosphates, dithiophosphates, 

phosphoramidates, phosphate esters, bridged phosphorothioates. bridged 
phosphoramidates. bridged methylenephosphonates, dephospho 

intemucleotide analogs with siloxane bridges, carbonate bridges, 
carboxymethyl ester bridges, carbonate bridges, carboxymethyl ester 
bridges, acetamide bridges, carbamate bridges, thioether bridges, sulfoxy 
bridges, sulfono bridges, various "plastic" DNAs, a-anomeric bridges and 
borane derivatives. For further details the reader is referred to reference 76. 

International patent application WO 89/12060 discloses various 
building blocks for synthesizing oligonucleotide analogs, as well as 
oligonucleotide analogs formed by joining such building blocks in a defined 
sequence. The building blocks may be either "rigid" (i.e., containing a ring 
structure) or "flexible" (i.e., lacking a ring structure). In both cases, the 
building blocks contain a hydroxy group and a mercapto group, through 
which the building blocks are said to join to form oligonucleotide analogs. 
The linking moiety in the oligonucleotide analogs is selected fi-om the group 
consisting of sulfide (-S-), sulfoxide (-SO-), and sulfone (-SO2-). However, 
the application provides no data supporting the specific binding of an 
oligonucleotide analog to a target oligonucleotide. 

International patent application WO 92/20702 describe an acyclic 
oligonucleotide which includes a peptide backbone on which any selected 
chemical nucleobases or analogs are stringed and serve as coding characters 
as they do in natural DNA or RNA. These new compounds, known as 
peptide nucleic acids (PNAs), are not only more stable in cells than their 
natural counterparts, but also bind natural DNA and RNA 50 to 100 times 
more tightly than the natural nucleic acids cling to each other (77). PNA 
oligomers can be synthesized fi-om the four protected monomers containing 
thymine, cytosine, adenine and guanine by Merrifield solid-phase peptide 
synthesis. In order to increase solubility in water and to prevent 
aggregation, a lysine amide group is placed at the C-terminal. 
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Thus, antisense technology requires pairing of messenger RNA with 
an oligonucleotide to form a double helix that inhibits translation. The 
concept of antisense-mediated gene therapy was already introduced in 1978 
for cancer therapy. This approach was based on certain genes that are 
crucial in cell division and growth of cancer cells. Synthetic fragments of 
genetic substance DNA can achieve this goal. Such molecules bind to the 
targeted gene molecules in RNA of tumor cells, thereby inhibiting the 
translation of the genes and resulting in dysfiinctional growth of these cells. 
Other mechanisms has also been proposed. These strategies have been 
used, with some success in treatment of cancers, as well as other illnesses, 
including viral and other infectious diseases. Antisense oligonucleotides 
are typically synthesized in lengths of 13-30 nucleotides. The life span of 
oligonucleotide molecules in blood is rather short. Thus, they have to be 
chemically modified to prevent destruction by ubiquitous nucleases present 
in the body. Phosphorothioates are very widely used modification in 
antisense oligonucleotide ongoing clinical trials (57). A new generation of 
antisense molecules consist of hybrid antisense oligonucleotide with a 
central portion of synthetic DNA while four bases on each end have been 
modified with 2'O-methyl ribose to resemble RNA. In preclinical studies in 
laboratory animals, such compounds have demonstrated greater stability to 
metabolism in body tissues and an improved safety profile when compared 
with the first-generation unmodified phosphorothioate (Hybridon Inc. 
news). Dosens of other nucleotide analogs have also been tested in 
antisense technology. 

RNA oligonucleotides may also be used for antisense inhibition as 
they form a stable RNA-RNA duplex with the target, suggesting efficient 
inhibition. However, due to their low stability RNA oligonucleotides are 
typically expressed inside the cells using vectors designed for this purpose. 
This approach is favored when attempting to target a mRNA that encodes 
an abundant and long-lived protein (57). 

Recent scientific publications have validated the efficacy of antisense 
compounds in animal models of hepatitis, cancers, coronary artery 
restenosis and other diseases. The first antisense drug was recently 
approved by the FDA. This drug Fomivirsen, developed by Isis, is 
indicated for local treatment of cytomegalovirus in patients with AIDS who 
are intolerant of or have a contraindication to other treatments for CMV 
retinitis or who were insufficiendy responsive to previous treatments for 
CMV retinitis (Pharmacotherapy News Network). 
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Several antisense compounds are now in clinical trials in the United 
States. These include locally administered antivirals, systemic cancer 
therapeutics. Antisense therapeutics has the potential to treat many life- 
threatening diseases with a number of advantages over traditional drugs. 
Traditional drugs intervene after a disease-causing protein is formed. 
Antisense therapeutics, however, block mRNA transcription/translation and 
intervene before a protein is formed, and since antisense therapeutics target 
only one specific mRNA, they should be more effective with fewer side 
effects than current protein-inhibiting therapy. 

A second option for disrupting gene expression at the level of 
transcription uses synthetic oligonucleotides capable of hybridizing with 
double stranded DNA. A triple helix is fornied. Such oligonucleotides may 
prevent binding of transcription factors to the gene's promoter and therefore 
inhibit transcription. Alternatively, they may prevent duplex unwinding 
and, therefore, transcription of genes within the triple helical structure. 

Another approach is the use of specific nucleic acid sequences to act 
as decoys for transcription factors. Since transcription factors bind specific 
DNA sequences it is possible to synthesize oligonucleotides that will 
effectively compete with the native DNA sequences for available 
transcription factors in vivo. This approach requires the identification of 
gene specific transcription factor (57). 

Indirect inhibition of gene expression was demonstrated for matrix 
metalloproteinase genes (MMP-1, -3, and -9), which are associated with 
invasive potential of human cancer cells. ElAF is a transcription activator 
of MMP genes. Expression of ElAF antisense RNA in HSC3AS cells 
showed decrease in mRNA and protein levels of MMP-1, -3, and -9. 
Moreover, HSC3AS showed lower invasive potential in vitro and in vivo. 
These results imply that transfection of antisense inhibits tumor invasion by 
down-regulating MMP genes (58). 
Ribozyntes: 

Ribozymes are being increasingly used for the sequence-specific 
inhibition of gene expression by the cleavage of mRNAs encoding proteins 
of interest. The possibility of designing ribozymes to cleave any specific 
target RNA has rendered them valuable tools in both basic research and 
therapeutic applications. In the therapeutics area, ribozymes have been 
exploited to target viral RNAs in infectious diseases, dominant oncogenes 
m cancers and specific somatic mutations in genetic disorders. Most 
notably, several ribozyme gene therapy protocols for HIV patients are 
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already in Phase 1 trials (62). More recently, ribozymes have been used for 
transgenic animal research, gene target validation and pathway elucidation. 
Several ribozymes are in various stages of clinical trials. ANGIOZYME 
was the first chemically synthesized ribozyme to be studied in human 
clinical trials. ANGIOZYME specifically inhibits formation of the VEGF-r 
(Vascular Endothelial Growth Factor receptor), a key component in the 
angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other 
firms have demonstrated the importance of anti-angiogenesis therapeutics in 
animal models. HEPTAZYME, a ribozyme designed to selectively destroy 
Hepatitis C Virus (HCV) RNA, was found effective in decreasing Hepatitis 
C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, 
Incorporated - WEB home page). 

Gene disruption in animal models: 

The emergence of gene inactivation by homologous recombination 
methodology in embryonic stem cells has revolutionized the field of mouse 
genetics. The availability of a rapidly growing number of mouse null 
mutants has represented an invaluable source of knowledge on mammalian 
development, cellular biology and physiology, and has provided many 
models for human inherited diseases. Animal models are required for an 
effective drug delivery development program and evaluation of gene 
therapy approach. The improvement of the original knockout strategy, as 
well as exploitation of exogenous enzymatic systems that are active in the 
recombination process, has been considerably extended the range of genetic 
manipulations that can be produced. Additional methods have been 
developed to provide versatile research tools: Double replacement method, 
sequential gene targeting, conditional cell type specific gene targeting' 
single copy integration method, inducible gene targeting, gene disruption by 
viral delivery, replacing one gene with another, the so called knock-in 
method and the induction of specific balanced chromosomal tt-anslocation. 
It is now possible to introduce a point mutation as a unique change in the 
entire genome, therefore allowing very fine dissection of gene function in 
vivo. Furthermore, the advent of methods allowing conditional gene 
targeting opens the way for analysis of consequence of a particular mutation 
in a defined organ and at a specific time during the life of the experimental 
animal (59). 

DNA vaccination: 

Observations in the early 1990s that plasmid DNA could directly 
transfect animal cells in vivo sparked exploration of the use of DNA 
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plasmids to induce immune response by direct injection into animal of DNA 
encoding antigenic protein. When a DNA vaccine plasmid enters the 
eukaryotic cell, the protein it encodes is transcribed and translated within 
the cell. In the case of pathogens, these proteins are presented to the 
immune system in their native form, mimicking the presentation of antigens 
during a natural infection. DNA vaccination is particularly useful for the 
induction of T cell activation. It was applied for viral and bacterial 
infectious diseases, as well as for allergy and for cancer. The central 
hypothesis behind active specific immunotherapy for cancer is that tumor 
cells express unique antigens that should stimulate the immune system. The 
first DNA vaccine against tumor was carcino-embrionic antigen (CEA). 
DNA vaccinated animals expressed immunoprotection and immunotherapy 
of human CEA-expressing syngeneic mouse colon and breast carcinoma 
(61). In a mouse model of neuroblastoma, DNA immunization with HuD 
resulted in tumor growth inhibition with no neurological disease (60). 
Immunity to the brown locus protein, gp75 tyrosinase-related protein- 1, 
associated with melanoma, was investigated in a syngeneic mouse model! 
Priming with human gp75 DNA broke tolerance to mouse gp75. Immunity 
against mouse gp75 provided significant tumor protection (60). 
Glycosyl hydrolases: 

Glycosyl hydrolases are a widespread group of enzymes that 
hydrolyze the o-glycosidic bond between two or more carbohydrates or 
between a carbohydrate and a noncarbohydrate moiety. The enzymatic 
hydrolysis of glycosidic bond occurs by using major one or two 
mechanisms leading to overall retention or inversion of the anomeric 
configuration. In both mechanisms catalysis involves two residues: a 
proton donor and a nucleophile. Glycosyl hydrolyses have been classified 
mto 58 families based on amino acid similarities. The glycosyl hydrolyses 
fi-om families 1, 2, 5, 10, 17, 30, 35, 39 and 42 act on a large variety of 
substrates, however, they all hydrolyze the glycosidic bond in a general acid 
catalysis mechanism, with retention of the anomeric configurafion. The 
mechanism involves two glutamic acid residues, which are the proton 
donors and the nucleophile, with an aspargine always preceding the proton 
donor. Analyses of a set of known 3D structures fi-om this group revealed 
that their catalytic domains, despite the low level of sequence identity 
adopt a similar (ot/p) 8 fold with the proton donor and the nucleophile 
located at the C-terminal ends of strands P4 and p7, respectively. Mutations 
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in the functional conserved amino acids of lysosomal glycosyl hydrolases 
were identified in lysosomal storage diseases. 

Lysosomal glycosyl hydrolases including P-glucuronidase, P- 
manosidase, p-glucocerebrosidase, p-galactosidase and a-L iduronidase, are 
all exo-glycosyl hydrolases, belong to the GH-A clan and share a similar 
catalytic site. However, many endo-glucanases fi-om various organisms, 
such as bacterial and fungal xylenases and cellulases share this catalytic 
domain. 

Genomic sequence ofhpa gene and its implications: 
It is well established that heparanase activity is correlated with 
cancer metastasis. This correlation was demonstrated at the level of 
enzymatic activity as well as the levels of protein and hpa cDNA expression 
in highly metastatic cancer cells as compared with non-metastatic cells. As 
such, inhibition of heparanase activity is desirable, and has been attempted 
by several means. The genomic region, encoding the hpa gene and the 
surrounding, provides a new powerful tool for regulation of heparanase 
activity at the level of gene expression. Regulatory sequences may reside in 
noncoding regions both upstream and downstream the transcribed region as 
well as in intron sequences. A DNA sequence upstream of the transcription 
start site contains the promoter region and potential regulatory elements. 
Regulatory factors, which interact with the promoter region may be 
identified and be used as potential drugs for inhibition of cancer, metastasis 
and inflammation. The promoter region can be used to screen for inhibitors 
of heparanase gene expression. Furthermore, the hpa promoter can be used 
to direct cell specific, particularly cancer cell specific, expression of foreign 
genes, such as cytotoxic or apoptotic genes, in order to specifically destroy 
cancer cells. 

Cancer and yet unknown related genetic disorders may involve 
rearrangements and mutations in the heparanase gene, either in coding or 
non-coding regions. Such mutations may affect expression level or 
enzymatic activity. The genomic sequence ofhpa enables the amplification 
of specific genomic DNA fragments, identification and diagnosis of 
mutations. 

There is thus a widely recognized need for, and it would be highly 
advantageous to have genomic, cDNA and composite polynucleotides 
encoding a polypeptide having heparanase activity, vectors including same, 
genetically modified cells expressing heparanase and a recombinant protein 
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having heparanase activity, as well as antisense oligonucleotides, constructs 
and ribozymes which can be used for down regulation heparanase activity. 

SUMMAR V OF TH P, INVF-NTTDN 

Cloning of the human hpa gene which encodes heparanase, and 
expression of recombinant heparanase by ti-ansfected host cells is reported 
herein, as well as downregulation of heparanase activity by antisense 
technology. 

A purified preparation of heparanase isolated from human hepatoma 
cells was subjected to tryptic digestion and microsequencing The 
YGPDVGQPR (SEQ ID NO:8) sequence revealed was used to screen EST 
databases for homology to the corresponding back translated DNA 
sequence. Two closely related EST sequences were identified and were 
thereafter found to be identical. Both clones contained an insert of 1020 bp 
which included an open reading frame of 973 bp followed by a 27 bp of 3' 
untranslated region and a Poly A tail. Translation start site was not 
identified. 

Cloning of the missing 5" end of hpa was performed by PGR 
amplification of DNA from placenta Marathon RACE cDNA composite 
using primers selected according to the EST clones sequence and the linkers 
of the composite. A 900 bp PGR fragment, partially overlapping with the 
Identified 3' encoding EST clones was obtained. The joined cDNA 
fragment {hpa), 1721 bp long (SEQ ID NO:9), contained an open reading 
frame which encodes a polypeptide of 543 amino acids (SEQ ID NO: 10) 
with a calculated molecular weight of 6 1 , 1 92 daltons. 

Cloning an extended 5' sequence was enabled from the human SK- 
hepl cell line by PGR amplification using the Marathon RACE. The 5' 
extended sequence of the SK-hepl hpa cDNA was assembled with the 
sequence of the hpa cDNA isolated from human placenta (SEQ ID NO:9). 
The assembled sequence contained an open reading frame, SEQ ID NOs: 13 
and 15, which encodes, as shown in SEQ ID NOs:14 and 15, a polypeptide 
of 592 amino acids with a calculated molecular weight of 66,407 daltons. 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate in an in vitro assay was examined by expressing the entire 
open reading frame of hpa in insect cells, using the Baculovirus expression 
system. Extracts and conditioned media of cells infected with virus 
containing the hpa gene, demonsti-ated a high level of heparan sulfate 
degradation activity both towards soluble ECM-derived HSPG and intact 
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ECM. This degradation activity was inhibited by heparin, which is another 
substrate of heparanase. Cells infected with a similar construct containing 
no hpa gene had no such activity, nor did non-infected cells. The ability of 
heparanase expressed from the extended 5* clone towards heparin was 
5 demonstrated in a mammalian expression system. 

The expression pattern of hpa RNA in various tissues and cell lines 
was investigated using RT-PCR. It was found to be expressed only in 
tissues and cells previously known to have heparanase activity. 

A panel of monochromosomal human/CHO and human/mouse 
10 somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can be 
used to identify a chromosome region harboring a human heparanase gene 
in a chromosome spread. 

A human genomic library was screened and the human locus 
15 harboring the heparanase gene isolated, sequenced and characterized. 
Alternatively spliced heparanase mRNAs were identified and characterized. 
The human heparanase promoter has been isolated, identified and positively 
tested for activity. The mouse heparanase promoter has been isolated and 
identified as well. Antisense heparanase constructs were prepared and their 
influence on cells in vitro tested. A predicted heparanase active site was 
identified. And finally, the presence of sequences hybridizing with human 
heparanase sequences was demonstrated for a variety of mammalians and 
for an avian. 

According to one aspect of the present invention there is provided an 
isolated nucleic acid comprising a genomic, complementary or composite 
polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide or a portion thereof is 
hybridizable with SEQ ID NOs: 9, 13, 42, 43 or a portion thereof at 68 °C 
in 6 X SSC, 1 % SDS, 5 x Denharts, 10 % dextran sulfate, 100 jxg/ml salmon 
sperm DNA, and 32p labeled probe and wash at 68 °C with 3 x SSC and 0.1 
% SDS. 

According to still further features in the described preferred 
embodiments the polynucleotide or a portion thereof is at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software package 
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developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 

According to still further features in the described preferred 
embodiments the polypeptide is as set forth in SEQ ID NOs:10, 14, 44 or 
portions thereof 

According to still further features in the described preferred 
embodiments the polypeptide is at least 60 % homologous to SEQ ID 
NOs:10, 14, 44 or portions thereof as determined v/ith the Smith- Waterman 
algorithm, using the Bioaccelerator platform developed by Compugene 
(gapop: 10.0, gapext: 0.5, matrix: blosum62). 

According to additional aspects of the present invention there are 
provided a nucleic acid construct (vector) comprising the isolated nucleic 
acid described herein and a host cell comprising the construct. 

According to a further aspect of the present invention there is 
provided an antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of at least 10 bases being hybridizable in vivo, under 
physiological conditions, v/ith a portion of a polynucleotide strand encoding 
a polypeptide having heparanase catalytic activity. 

According to an additional aspect of the present invention there is 
provided a method of in vivo dovt^nregulating heparanase activity 
comprising the step of in vivo administering the antisense oligonucleotide 
herein described. 

According to yet an additional aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense 
oligonucleotide herein described and a pharmaceutically acceptable carrier. 

According to still an additional aspect of the present invention there 
is provided a ribozyme comprising the antisense oligonucleotide described 
herein and a ribozyme sequence. 

According to a further aspect of the present invention there is 
provided an antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo, 
under physiological conditions, with a portion of a polynucleotide strand 
encoding a polypeptide having heparanase catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the polynucleotide strand encoding the 
polypeptide having heparanase catalytic activity is as set forth in SEQ ID 
NOs: 9, 13,42 or 43. 
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According to still further features in the described preferred 
embodiments the polypeptide having heparanase catalytic activity is as set 
forth in SEQ ID NOs: 10, 14 or 44. 

According to still a further aspect of the present invention there is 
provided a method of in vivo downregulating heparanase activity 
comprising the step of in vivo administering the antisense nucleic acid 
construct herein described. 

According to yet a further aspect of the present invention there is 
provided a pharmaceutical composition comprising the antisense nucleic 
acid construct herein described and a pharmaceutical ly acceptable carrier. 

According to a further aspect of the present invention there is 
provided a nucleic acid construct comprising a polynucleotide sequence 
functioning as a promoter, the polynucleotide sequence is derived from 
SEQ ID NO:42 and includes at least nucleotides 2535-2635 thereof or from 
SEQ ID NO:43 and includes at least nucleotides 320-420. 

According to a further aspect of the present invention there is 
provided a method of expressing a polynucleotide sequence comprising the 
step of ligating the polynucleotide sequence into the nucleic acid construct 
described above, dovmstream of the polynucleotide sequence derived from 
SEQ ID NOs:42 or 43. 

According to a further aspect of the present invention there is 
provided a recombinant protein comprising a polypeptide having 
heparanase catalytic activity. 

According to further features in preferred embodiments of the 
invention described below, the polypeptide includes at least a portion of 
SEQ ID NOs: 10, 14 or 44. 

According to still further features in the described preferred 
embodiments the protein is encoded by a polynucleotide hybridizable with 
SEQ ID NOs: 9, 13, 42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % 
SDS, 5 X Denharts, 10 % dextran sulfate, 100 (ig/ml salmon sperm DNA, 
and 32p labeled probe and wash at 68 °C with 3 x SSC and 0.1 % SDS. 

According to still further features in the described preferred 
embodiments the protein is encoded by a polynucleotide at least 60 % 
identical with SEQ ID NOs: 9, 13, 42, 43 or portions thereof as determined 
using the Bestfit procedure of the DNA sequence analysis software package 
developed by the Genetic Computer Group (GCG) at the university of 
Wisconsin (gap creation penalty - 12, gap extension penalty - 4). 
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According to a fixrther aspect of the present invention there is 
provided a pharmaceutical composition comprising, as an active ingredient, 
the recombinant protein herein described. 

According to a further aspect of the present invention there is 
provided a method of identifying a chromosome region harboring a 
heparanase gene in a chromosome spread comprising the steps of (a) 
hybridizing the chromosome spread with a tagged polynucleotide probe 
encoding heparanase; (b) washing the chromosome spread, thereby 
removing excess of non-hybridized probe; and (c) searching for signals 
associated with the hybridized tagged polynucleotide probe, wherein 
detected signals being indicative of a chromosome region harboring a 
heparanase gene. 

According to a further aspect of the present invention there is 
provided a method of in vivo eliciting anti-heparanase antibodies 
comprising the steps of administering a nucleic acid construct including a 
polynucleotide segment corresponding to at least a portion of SEQ ID 
NOs:9, 13 or 43 and a promoter for directing the expression of said 
polynucleotide segment in vivo. Accordingly, there is provided also a DNA 
vaccine for in vivo eliciting anti-heparanase antibodies comprising a nucleic 
acid construct including a polynucleotide segment corresponding to at least 
a portion of SEQ ID NOs:9, 13 or 43 and a promoter for directing the 
expression of said polynucleotide segment in vivo. 

The present invention can be used to develop new drugs to inhibit 
tumor cell metastasis, inflammation and autoimmunity. The identification 
of the hpa gene encoding for heparanase enzyme enables the production of 
a recombinant enzyme in heterologous expression systems. Additional 
features, advantages, uses and applications of the present invention in 
biological science and in diagnostic and therapeutic medicine are described 
hereinafter. 

BRIEF PRES CRIPTION OF THE PR AWINOS; 

The invention herein described, by way of example only, with 
reference to the accompanying drawings, wherein: 

FIG. 1 presents nucleotide sequence and deduced amino acid 
sequence of hpa cDNA. A single nucleotide difference at position 799 (A 
to T) between the EST (Expressed Sequence Tag) and the PGR amplified 
cDNA (reverse transcribed RNA) and the resulting amino acid substitution 
(Tyr to Phe) are indicated above and below the substituted unit, 



wo 00/52178 



PCTAJSOO/03542 



26 

respectively. Cysteine residues and the poly adenylation consensus 
sequence are underlined. The asterisk denotes the stop codon TGA. 

FIG. 2 demonstrates degradation of soluble sulfate labeled HSPG 
substrate by lysates of High Five cells infected with pFhpal virus. Lysates 
of High Five cells that were infected with pFhpal virus (•) or control pF2 
virus (□) were incubated (18 h, 37 ^C) with sulfate labeled ECM-derived 
soluble HSPG (peak I). The incubation medium was then subjected to gel 
filtration on Sepharose 6B. Low molecular weight HS degradation 
fragments (peak II) were produced only during incubation with the pFhpal 
infected cells, but there was no degradation of the HSPG substrate (^) by 
lysates of pF2 infected cells. 

FIGs. 3a-b demonstrate degradation of soluble sulfate labeled HSPG 
substrate by the culture medium of pFhpal and pFhpaA infected cells. 
Culture media of High Five cells infected with pFhpal (3a) or pFhpaA (3b) 
viruses (•), or with control viruses (a) were incubated (18 h, 37 ^C) with 
sulfate labeled ECM-derived soluble HSPG (peak I, ^). The incubation 
media were then subjected to gel filtration on Sepharose 6B, Low 
molecular weight HS degradation fi-agments (peak II) were produced only 
during incubation with the hpa gene containing viruses. There was no 
degradation of the HSPG substrate by the culture medium of cells infected 
with control viruses. 

FIG. 4 presents size fi-actionation of heparanase activity expressed by 
pFhpal infected cells. Culture medium o^ pFhpal infected High Five cells 
was applied onto a 50 kDa cut-off membrane. Heparanase activity 
(conversion of the peak I substrate, (^) into peak II HS degradation 
ft-agments) was found in the high (> 50 kDa) (•), but not low (< 50 kDa) (o) 
molecular weight compartment. 

FIGs. 5a-b demonstrate the effect of heparin on heparanase activity 
expressed by pFhpal and pFhpaA infected High Five cells. Culture media 
of pFhpal (5a) and pFhpa4 (5b) infected High Five cells were incubated 
(18 h, 37 oC) with sulfate labeled ECM-derived soluble HSPG (peak I, ^) in 
the absence (•) or presence (a) of 10 ^g/ml heparin. Production of low 
molecular weight HS degradation fi-agments was completely abolished in 
the presence of heparin, a potent inhibitor of heparanase activity (6, 7). 

FIGs, 6a-b demonstrate degradation of sulfate labeled intact ECM by 
virus infected High Five and Sf21 cells. High Five (6a) and Sf21 (6b) cells 
were plated on sulfate labeled ECM and infected (48 h, 28 ^C) with pFhpa4 
(•) or control pFl (□) viruses. Control non-infected Sf21 cells (r) were 
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plated on the labeled ECM as well. The pH of the cultured medium was 
adjusted to 6.0 - 6.2 followed by 24 h incubation at 37 ^C. Sulfate labeled 
material released into the incubation medium was analyzed by gel filtration 
on Sepharose 6B. HS degradation fragments were produced only by cells 
5 infected with the hpa containing virus. 

FIG. 7a-b demonstrate degradation of sulfate labeled intact ECM by 
virus infected cells. High Five (7a) and Sf21 (7b) cells were plated on 
sulfate labeled ECM and infected (48 h, 28 ^C) with pFhpa4 (•) or control 
pFl (□) viruses. Control non-infected S£21 cells (r) were plate on labeled 
10 ECM as well. The pH of the cultured medium was adjusted to 6.0 - 6.2, 
followed by 48 h incubation at 28 °C. Sulfate labeled degradation 
fragments released into the incubation medium was analyzed by gel 
filtration on Sepharose 6B. HS degradation fragments were produced only 
by cells infected with the hpa containing virus. 
15 FIGs. 8a-b demonstrate degradation of sulfate labeled intact ECM by 

the culture medium oipFhpa4 infected cells. Culture media of High Five 
(8a) and SfZl (8b) cells that were infected with pFhpa4 (•) or control pFl ( 
□) viruses were incubated (48 h, 37 ^C, pH 6.0) with intact sulfate labeled 
ECM. The ECM was also incubated with the culture medium of control 
20 non-infected S£21 cells (r). Sulfate labeled material released into the 
reaction mixture was subjected to gel filtration analysis. Heparanase 
activity was detected only in the culture medium ofpFhpaA infected cells. 

FIGs. 9a-b demonstrate the effect of heparin on heparanase activity 
in the culture medium of pFhpa4 infected cells. Sulfate labeled ECM was 
25 incubated (24 h, 37 oc, pH 6.0) with culture medium of pFhpa4 infected 
High Five (9a) and Sf21 (9b) cells in the absence (•) or presence (V) of 10 
^g/ml heparin. Sulfate labeled material released into the incubation 
medium was subjected to gel filtration on Sepharose 6B. Heparanase 
activity (production of peak II HS degradation fi-agments) was completely 
30 inhibited in the presence of heparin. 

FIGs. lOa-b demonstrate purification of recombinant heparanase on 
heparin-Sepharose. Culture medium of Sf21 cells infected with pFhpa4 
virus was subjected to heparin-Sepharose chromatography. Elution of 
fractions was performed with 0.35 - 2 M NaCl gradient (<>). Heparanase 
35 activity in the eluted fractions is demonstrated in Figure 10a (•). Fractions 
15-28 were subjected to 15 % SDS-polyacrylamide gel electrophoresis 
followed by silver nitrate staining. A correlation is demonstrated between a 
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major protein band (MW 63,000) in fractions 19 - 24 and heparanase 
activity. 

FIGs. 1 la-b demonstrate purification of recombinant heparanase on 
a Superdex 75 gel filtration column. Active fractions eluted from heparin- 
5 Sepharose (Figure 10a) were pooled, concentrated and applied onto 
Superdex 75 FPLC column. Fractions were collected and aliquots of each 
fraction were tested for heparanase activity (C, Figure 11a) and analyzed by 
SDS-polyacrylamide gel electrophoresis followed by silver nitrate staining 
(Figure lib). A correlation is seen between the appearance of a major 
10 protein band (MW 63,000) in fractions 4 - 7 and heparanase activity. 

FIGs. 12a-e demonstrate expression of the hpa gene by RT-PCR 
with total RNA from human embryonal tissues (12a), human extra- 
embryonal tissues (12b) and cell lines from different origins (12c-e). RT- 
PCR products using hpa specific primers (I), primers for GAPDH 
5 housekeeping gene (II), and control reactions without reverse transcriptase 
demonstrating absence of genomic DNA or other contamination in RNA 
samples (III). M- DNA molecular weight marker VI (Boehringer 
Mannheim). For 12a: lane 1 - neutrophil cells (adult), lane 2 - muscle, lane 
3 - thymus, lane 4 - heart, lane 5 - adrenal. For 12b: lane 1 - kidney, lane 2 - 
placenta (8 weeks), lane 3 - placenta (1 1 weeks), lanes 4-7 - mole (complete 
hydatidiform mole), lane 8 - cytotrophoblast cells (freshly isolated), lane 9 - 
cytotrophoblast cells (1.5 h in vitro), lane 10 - cytotrophoblast cells (6 h in 
vitro), lane 11 - cytotrophoblast cells (18 h in vitro), lane 12 - 
cytotrophoblast cells (48 h in vitro). For 12c: lane 1 - JAR bladder cell line, 
lane 2 - NCITT testicular tumor cell line, lane 3 - SW-480 human hepatoma 
cell line, lane 4 - HTR (cytptrophoblasts transformed by SV40), lane 5 - 
HPTLP-I hepatocellular carcinoma cell line, lane 6 - EJ-28 bladder 
carcinoma cell line. For 12d: lane 1 - SK-hep-1 human hepatoma cell line, 
lane 2 - DAMI human megakaryocytic cell line, lane 3 - DAMI cell line + 
PMA, lane 4 - CHRP cell line + PMA, lane 5 - CHRP cell line. For 12e: 
lane 1 - ABAE bovine aortic endothelial cells, lane 2- 1063 human ovarian 
cell line, lane 3 - human breast carcinoma MDA435 cell line, lane 4 - 
human breast carcinoma MDA231 cell line. 

FIG. 1 3 presents a comparison between nucleotide sequences of the 
human hpa and a mouse EST cDNA fragment (SEQ ID NO: 12) which is 80 
% homologous to the 3* end (starting at nucleotide 1066 of SEQ ID NO:9) 
of the human hpa. The aligned termination codons are underlined. 
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FIG. 14 demonstrates the chromosomal localization of the hpa gene. 
PGR products of DNA derived from somatic cell hybrids and of genomic 
DNA of hamster, mouse and human of were separated on 0.7 % agarose gel 
following amplification with hpa specific primers. Lane 1 - Lambda DNA 
5 digested with BstEll, lane 2 - no DNA control, lanes 3 - 29, PGR 
amplification products. Lanes 3-5 - human, mouse and hamster genomic 
DNA, respectively. Lanes 6-29, human monochromosomal somatic cell 
hybrids representing chromosomes 1-22 and X and Y, respectively. Lane 
30 - Lambda DNA digested with BstEll. An amplification product of 
10 approximately 2.8 Kb is observed only in lanes 5 and 9, representing human 
genomic DNA and DNA derived from cell hybrid carrying human 
chromosome 4, respectively. These results demonstrate that the hpa gene is 
localized in human chromosome 4. 

FIG. 15 demonstrates the genomic exon-intron structure of the 
15 human hpa locus (top) and the relative positions of the lambda clones used 
as sequencing templates to sequence the locus (below). The vertical 
rectangles represent exons (E) and the horizontal lines therebetween 
represent introns (I), upstream (U) and downstream (D) regions. 
Gontinuous lines represent DNA fragments, which were used for sequence 
!0 analysis. The discontinuous line in lambda 6 represent a region, which 
overlaps with lambda 8 and hence was not analyzed. The plasmid contains 
a PGR product, which bridges the gap between L3 and L6. 

FIG. 16 presents the nucleotide sequence of the genomic region of 
the hpa gene. Exon sequences appear in upper case and intron sequences in 
5 lower case. The deduced amino acid sequence of the exons is printed below 
the nucleotide sequence. Two predicted transcription start sites are shown 
in bold. 

FIG. 17 presents an alignment of the amino acid sequences of human 
heparanase, mouse and partial sequences of rat homologues. The human 

0 and the mouse sequences were determined by sequence analysis of the 
isolated cDNAs. The rat sequence is derived from two different EST 
clones, which represent two different regions (5' and 3') of the rat hpa 
cDNA. The human sequence and the amino acids in the mouse and rat 
homologues, which are identical to the human sequence, appear in bold. 

> FIG. 1 8 presents a heparanase Zoo blot. Ten micrograms of genomic 

DNA from various sources were digested with EcoBl and separated on 0.7 
% agarose - TBE gel. Following electrophoresis, the was gel treated with 
HGl and than with NaOH and the DNA fragments were downward 
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transferred to a nylon membrane (Hybond N+, Amersham) with 0.4 N 
NaOH. The membrane was hybridized with a 1.6 Kb DNA probe that 
contained the entire hpa cDNA. Lane order: H - Human; M - Mouse; Rt - 
Rat; P - Pig; Cw - Cow; Hr - Horse; S - Sheep; Rb - Rabbit; D - Dog; Ch 
5 - Chicken; F - Fish. Size markers (Lambda Bstell) are shown on the left 

FIG. 19 demonstrates the secondary structure prediction for 
heparanase performed using the PHD server - Profile network Prediction 
Heidelberg. H - helix, E - extended (beta strand). The glutamic acid 
predicted as the proton donor is marked by asterisk and the possible 
10 nucleophiles are underlined. 

DESCRIPTION OF T HE PRRFRRRED EMBODTMRNTr.^ 

The present invention is of a polynucleotide or nucleic acid, referred 
to hereinbelow interchangeably as hpa, hpa cDNA or hpa gene or identified 
by its SEQ ID NOs, encoding a polypeptide having heparanase activity, 
vectors or nucleic acid constructs including same and which are used for 
over-expression or antisense inhibition of heparanase, genetically modified 
cells expressing same, recombinant protein having heparanase activity, 
antisense oligonucleotides and ribozymes for heparanase modulation, and 
heparanase promoter sequences which can be used to direct the expression 
of desired genes. 

Before explaining at least one embodiment of the invention in detail. 
It is to be understood that the invention is not limited in its application to the 
details of construction and the arrangement of the components set forth in 
the following description or illustrated in the drawings. The invention is 
capable of other embodiments or of being practiced or carried out in various 
ways. Also, it is to be understood that the phraseology and terminology 
employed herein is for the purpose of description and should not be 
regarded as limiting. 

Cloning of the human and mouse hpa genes, cDNAs and genomic 
sequence (for human), encoding heparanase and expressing recombinant 
heparanase by transfected cells is reported herein. These are the first 
mammalian heparanase genes to be cloned. 

A purified preparation of heparanase isolated fi-om human hepatoma 
cells was subjected to tryptic digestion and microsequencing. 

The YGPDVGQPR (SEQ ID NO: 8) sequence revealed was used to 
screen EST databases for homology to the corresponding back translated 
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DNA sequences. Two closely related EST sequences were identified and 
were thereafter found to be identical. 

Both clones contained an insert of 1020 bp which includes an open 
reading frame of 973 bp followed by a 3' untranslated region of 27 bp and a 
5 Poly A tail, whereas a translation start site was not identified. 

Cloning of the missing 5* end was performed by PGR amplification 
of DNA from placenta Marathon RACE cDNA composite using primers 
selected according to the EST clones sequence and the linkers of the 
composite. 

10 A 900 bp PCR fragment, partially overlapping with the identified 3' 

encoding EST clones was obtained. The joined cDNA fragment {hpa), 
1721 bp long (SEQ ID NO:9), contained an open reading frame which 
encodes, as shown in Figure 1 and SEQ ID NO: 11, a polypeptide of 543 
amino acids (SEQ ID NO: 10) with a calculated molecular weight of 61,192 

15 daltons. 

A single nucleotide difference at position 799 (A to T) between the 
EST clones and the PCR amplified cDNA was observed. This difference 
results in a single amino acid substitution (Tyr to Phe) (Figure 1). 
Furthermore, the published EST sequences contained an unidentified 
20 nucleotide, which following DNA sequencing of both the EST clones was 
resolved into two nucleotides (G and C at positions 1630 and 1631 in SEQ 
ID NO:9, respectively). 

The ability of the hpa gene product to catalyze degradation of 
heparan sulfate in an in vitro assay was examined by expressing the entire 
!5 open reading frame in insect cells, using the Baculovirus expression system. 

Extracts and conditioned media of cells infected with virus 
containing the hpa gene, demonstrated a high level of heparan sulfate 
degradation activity both towards soluble ECM-derived HSPG and intact 
ECM, which was inhibited by heparin, while cells infected with a similar 
0 construct containing no hpa gene had no such activity, nor did non-infected 
cells. 

The expression pattem of hpa RNA in various tissues and cell lines 
was investigated using RT-PCR. It was found to be expressed only in 
tissues and cells previously known to have heparanase activity. 
5 Cloning an extended 5' sequence was enabled from the human SK- 

hepl cell line by PCR amplification using the Marathon RACE. The 5' 
extended sequence of the SK-hepl hpa cDNA was assembled with the 
sequence of the hpa cDNA isolated from human placenta (SEQ ID NO:9). 



wo 00/52178 



PCT/USOO/03542 



32 

The assembled sequence contained an open reading frame, SEQ ID NOs: 13 
and 15, which encodes, as shown in SEQ ID NOs: 14 and 15, a polypeptide 
of 592 amino acids, with a calculated molecular weight of 66,407 daltons. 
This open reading frame was shown to direct the expression of catalytically 
5 active heparanase in a mammalian cell expression system. The expressed 
heparanase was detectable by anti heparanase antibodies in Western blot 
analysis. 

A panel of monochromosomal human/CHO and human/mouse 
somatic cell hybrids was used to localize the human heparanase gene to 
human chromosome 4. The newly isolated heparanase sequence can 
therefore be used to identify a chromosome region harboring a human 
heparanase gene in a chromosome spread. 

. The hpa cDNA was then used as a probe to screen a a human 
genomic library. Several phages were positive. These phages were 
analyzed and were found to cover most of the hpa locus, except for a small 
portion which was recovered by bridging PGR. The hpa locus covers about 
50,000 bp. The hpa gene includes 12 exons separated by 1 1 introns. 

RT-PCR performed on a variety of cells revealed alternatively 
spliced hpa transcripts. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 
human EST's were identified, as well as mouse sequences highly 
homologous to human heparanase. The following mouse EST's were 
identified AAl 77901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 
cDNA was cloned, based on the nucleotide sequence of the mouse EST's 
using Marathon cDNA libraries. The mouse and the human hpa gen^s 
share an average homology of 78 % between the nucleotide sequences and 
81 % similarity between the deduced amino acid sequences, hpa 
homologous sequences from rat were also uncovered (EST's AI060284 and 
AI237828). 

Homology search of heparanase amino acid sequence against the 
DNA and the protein databases and prediction of its protein secondary 
structure enabled to identify candidate amino acids that participate in the 
heparanase active site. 
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Expression of hpa antisense in mammalian cell lines resulted in 
about five fold decrease in the number of recoverable cells as compared to 
controls. 

Human Hpa cDNA was shown to hybridize with genomic DNAs of a 
5 variety of mammalian species and with an avian. 

The human and mouse hpa promoters were identified and the human 
promoter was tested positive in directing the expression of a reporter gene. 

Thus, according to the present invention there is provided an isolated 
nucleic acid comprising a genomic, complementary or composite 
10 polynucleotide sequence encoding a polypeptide having heparanase 
catalytic activity. 

The phrase "composite polynucleotide sequence" refers to a 
sequence which includes exonal sequences required to encode the 
polypeptide having heparanase activity, as well as any number of intronal 
5 sequences. The intronal sequences can be of any source and typically will 
include conserved splicing signal sequences. Such intronal sequences may 
further include cis acting expression regulatory elements. 

The term "heparanase catalytic activity" or its equivalent term 
"heparanase activity" both refer to a mammalian endoglycosidase 
hydrolyzing activity which is specific for heparan or heparan sulfate 
proteoglycan substrates, as opposed to the activity of bacterial enzymes 
(heparinase I, II and III) which degrade heparin or heparan sulfate by means 
of p-elimination (37). 

According to a preferred embodiment of the present invention the 
polynucleotide or a portion thereof is hybridizable with SEQ ID NOs: 9, 13, 
42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x Denharts, 10 
% dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p labeled probe 
and wash at 68 °C with 3, 2, 1, 0.5 or 0.1 x SSC and 0.1 % SDS. 

According to another preferred embodiment of the present invention 
the polynucleotide or a portion thereof is at least 60 %, preferably at least 
65 %, more preferably at least 70 %, more preferably at least 75 %, more 
preferably at least 80 %, more preferably at least 85 %, more preferably at 
least 90 %, most preferably, 95-100 % identical with SEQ ID NOs: 9, 13, 
42, 43 or portions thereof as determined using the Bcstfit procedure of the 
DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 12, gap extension penalty - 4 - which are the default parameters). 
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According to another preferred embodiment of the present invention 
the polypeptide encoded by the polynucleotide sequence is as set forth in 
SEQ ID NOsilO, 14, 44 or portions thereof having heparanase catalytic 
activity. Such portions are expected to include amino acids Asp-Glu 224- 
5 225 (SEQ ID NO: 10), v/hich can serve as proton donors and glutamic acid 
343 or 396 which can serve as a nucleophile. 

According to another preferred embodiment of the present invention 
the polypeptide encoded by the polynucleotide sequence is at least 60 %, 
preferably at least 65 %, more preferably at least 70 %, more preferably at 
0 least 75 %, more preferably at least 80 %, more preferably at least 85 %, 
more preferably at least 90 %, most preferably, 95-100 % homologous (both 
similar and identical acids) to SEQ ID NOs:10, 14, 44 or portions thereof as 
determined with the Smith- Waterman algorithm, using the Bioaccelerator 
platform developed by Compugene (gapop: 10.0, gapext: 0.5, matrix: 
blosum62, see also the description to Figure 17). 

Further according to the present invention there is provided a nucleic 
acid construct comprising the isolated nucleic acid described herein. The 
construct may and preferably further include an origin of replication and 

The construct or vector can be of any type. It may be a phage which 
infects bacteria or a virus which infects eukaryotic cells. It may also be a 
plasmid, phagemid, cosmid, bacmid or an artificial chromosome. 

Further according to the present invention there is provided a host 
cell comprising the nucleic acid construct described herein. The host cell 
can be of any type. It may be a prokaryotic cell, an eukaryotic cell, a cell 
line, or a cell as a portion of an organism. The polynucleotide encoding 
heparanase can be permanently or transiently present in the cell. In other 
words, genetically modified cells obtained following stable or transient 
transfection, transformation or transduction are all within the scope of the 
present invention. The polynucleotide can be present in the cell in low copy 
(say 1-5 copies) or high copy number (say 5-50 copies or more). It may be 
integrated in one or more chromosomes at any location or be present as an 
extrachromosomal material. 

The present invention is further directed at providing a heparanase 
over-expression system which includes a cell overexpressing heparanase 
catalytic activity. The cell may be a genetically modified host cell 
transiently or stably transfected or transformed with any suitable vector 
which includes a polynucleotide sequence encoding a polypeptide having 
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heparanase activity and a suitable promoter and enhancer sequences to 
direct over-expression of heparanase. However, the overexpressing cell 
may also be a product of an insertion (e.g., via homologous recombination) 
of a promoter and/or enhancer sequence downstream to the endogenous 
5 heparanase gene of the expressing cell, which will direct over-expression 
from the endogenous gene. 

The term "over-expression" as used herein in the specification and 
claims below refers to a level of expression which is higher than a basal 
level of expression typically characterizing a given cell under otherwise 
10 identical conditions. 

According to another aspect the present invention provides an 
antisense oligonucleotide comprising a polynucleotide or a polynucleotide 
analog of at least 10, preferably 11-15, more preferably 16-17, more 
preferably 18, more preferably 19-25, more preferably 26-35, most 
15 preferably 35-100 bases being hybridizable in vivo, under physiological 
conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. The antisense oligonucleotide can be 
used for downregulating heparanase activity by in vivo administration 
thereof to a patient. As such, the antisense oligonucleotide according to the 
20 present invention can be used to treat types of cancers which are 
characterized by impaired (over) expression of heparanase, and are 
dependent on the expression of heparanase for proliferating or forming 
metastases. 

The antisense oligonucleotide can be DNA or RNA or even include 
25 nucleotide analogs, examples of which are provided in the Background 
section hereinabove. The antisense oligonucleotide according to the present 
invention can be synthetic and is preferably prepared by solid phase 
synthesis. In addition, it can be of any desired length which still provides 
specific base pairing (e.g., 8 or 10, preferably more, nucleotides long) and it 
30 can include mismatches that do not hamper base pairing under physiological 
conditions. 

Further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense oligonucleotide 
herein described and a pharmaceutically acceptable carrier. The carrier can 
i5 be, for example, a liposome loadable with the antisense oligonucleotide. 

According to a preferred embodiment of the present invention the 
antisense oligonucleotide further includes a ribozyme sequence. The 
ribozyme sequence serves to cleave a heparanase RNA molecule to which 
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the antisense oligonucleotide binds, to thereby downregulate heparanase 
expression. 

Further according to the present invention there is provided an 
antisense nucleic acid construct comprising a promoter sequence and a 
5 polynucleotide sequence directing the synthesis of an antisense RNA 
sequence of at least 10 bases being hybridizable in vivo, under physiological 
conditions, with a portion of a polynucleotide strand encoding a polypeptide 
having heparanase catalytic activity. Like the antisense oligonucleotide, the 
antisense construct can be used for downregulating heparanase activity by 
10 in vivo administration thereof to a patient. As such, the antisense construct, 
like the antisense oligonucleotide, according to the present invention can be 
used to treat types of cancers which are characterized by impaired (over) 
expression of heparanase, and are dependent on the expression of 
heparanase for proliferating or forming metastases. 

Thus, further according to the present invention there is provided a 
pharmaceutical composition comprising the antisense construct herein 
described and a pharmaceutically acceptable carrier. The carrier can be, for 
example, a liposome loadable with the antisense construct. 

Formulations for topical administration may include, but are not 
limited to, lotions, ointments, gels, creams, suppositories, drops, liquids, 
sprays and powders. Conventional pharmaceutical carriers, aqueous, 
powder or oily bases, thickeners and the like may be necessary or desirable. 
Coated condoms, stents, active pads, and other medical devices may also be 
useful. Compositions for oral administration include powders or granules, 
suspensions or solutions in water or non-aqueous media, sachets, capsules 
or tablets. Thickeners, diluents, flavorings, dispersing aids, emulsifiers or 
binders may be desirable. Formulations for parenteral administration may 
include, but are not limited to, sterile aqueous solutions which may also 
contain buffers, diluents and other suitable additives. 

Dosing is dependent on severity and responsiveness of the condition 
to be treated, but will normally be one or more doses per day, week or 
month with course of treatment lasting from several days to several months 
or until a cure is effected or a diminution of disease state is achieved. 
Persons ordinarily skilled in the art can easily determine optimum dosages, 
dosing methodologies and repetition rates. 

Further according to the present invention there is provided a nucleic 
acid construct comprising a polynucleotide sequence functioning as a 
promoter, the polynucleotide sequence is derived from SEQ ID NO:42 and 
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includes at least nucleotides 2135-2635, preferably 2235-2635, more 
preferably 2335-2635, more preferably 2435-2635, most preferably 2535- 
2635 thereof, or SEQ ID NO:43 and includes at least nucleotides 1-420, 
preferably 120-420, more preferably 220-420, most preferably 320-420, 
5 thereof These nucleotides are shown in the example section that follows to 
direct the synthesis of a reporter gene in transformed cells. Thus, further 
according to the present invention there is provided a method of expressing 
a polynucleotide sequence comprising the step of ligating the 
polynucleotide sequence downstream to either of the promoter sequences 
10 described herein. Heparanase promoters can be isolated from a variety of 
mammalian an other species by cloning genomic regions present 5' to the 
coding sequence thereof This can be readily achievable by one ordinarily 
skilled in the art using the heparanase polynucleotides described herein, 
which are shown in the Examples section that follows to participate in 
5 efficient cross species hybridization. 

Further according to the present invention there is provided a 
recombinant protein comprising a polypeptide having heparanase catalytic 
activity. The protein according to the present invention include 
modifications known as post translational modifications, including, but not 
limited to, proteolysis (e.g., removal of a signal peptide and of a pro- or 
preprotein sequence), methionine modification, glycosylation, alkylation 
(e.g., methylation), acetylation, etc. According to preferred embodiments 
the polypeptide includes at least a portion of SEQ ID NOs:10, 14 or 44, the 
portion has heparanase catalytic activity. According to preferred 
embodiments of the present invention the protein is encoded by any of the 
above described isolated nucleic acids. Further according to the present 
invention there is provided a pharmaceutical composition comprising, as an 
active ingredient, the recombinant protein described herein. 

The recombinant protein may be purified by any conventional 
protein purification procedure close to homogeneity and/or be mixed with 
additives. The recombinant protein may be manufactured using any of the 
genetically modified cells described above, which include any of the 
expression nucleic acid constructs described herein. The recombinant 
protein may be in any form. It may be in a crystallized form, a dehydrated 
powder form or in solution. The recombinant protein may be useful in 
obtaining pure heparanase, which in turn may be useful in eliciting anti- 
heparanase antibodies, either poly or monoclonal antibodies, and as a 
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screening active ingredient in an anti-heparanase inhibitors or drugs 
screening assay or system. 

Further according to the present invention there is provided a method 
of identifying a chromosome region harboring a human heparanase gene in 
a chromosome spread, the method is executed implementing the following 
method steps, in which in a first step the chromosome spread (either 
interphase or metaphase spread) is hybridized with a tagged polynucleotide 
probe encoding heparanase. The tag is preferably a fluorescent tag. In a 
second step according to the method the chromosome spread is washed, 
thereby excess of non-hybridized probe is removed. Finally, signals 
associated with the hybridized tagged polynucleotide probe are searched 
for, wherein detected signals being indicative of a chromosome region 
harboring the human heparanase gene. One ordinarily skilled in the art 
would know how to use the sequences disclosed herein in suitable labeling 
reactions and how to use the tagged probes to detect, using in situ 
hybridization, a chromosome region harboring a human heparanase gene. 

Further according to the present invention there is provided a method 
of in vivo eliciting anti-heparanase antibodies comprising the steps of 
administering a nucleic acid construct including a polynucleotide segment 
corresponding to at least a portion of SEQ ID NOs:9, 13 or 43 and a 
promoter for directing the expression of said polynucleotide segment in 
vivo. Accordingly, there is provided also a DNA vaccine for in vivo 
eliciting anti-heparanase antibodies comprising a nucleic acid construct 
including a polynucleotide segment corresponding to at least a portion of 
SEQ ID NOs:9, 13 or 43 and a promoter for directing the expression of said 
polynucleotide segment in vivo. The vaccine optionally further includes a 
pharmaceutically acceptable carrier, such as a virus, liposome or an antigen 
presenting cell. Alternatively, the vaccine is employed as a naked DNA 
vaccine 

The present invention can be used to develop treatments for various 
diseases, to develop diagnostic assays for these diseases and to provide new 
tools for basic research especially in the fields of medicine and biology. 

Specifically, the present invention can be used to develop new drugs 
to inhibit tumor cell metastasis, inflammation and autoimmunity. The 
identification of the hpa gene encoding for the heparanase enzyme enables 
the production of a recombinant enzyme in heterologous expression 
systems. 
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Furthermore, the present invention can be used to modulate 
bioavailabiUty of heparin-binding growth factors, cellular responses to 
heparin-binding growth factors (e.g., bFGF, VEGF) and cytokines (e.g., IL- 
8), cell interaction with plasma lipoproteins, cellular susceptibility to viral, 
5 protozoa and some bacterial infections, and disintegration of 
neurodegenerative plaques. Recombinant heparanase offers a potential 
treatment for wound healing, angiogenesis, restenosis, atherosclerosis, 
inflammation, neurodegenerative diseases (such as, for example, 
Genstmann-Straussler Syndrome, Creutzfeldt- Jakob disease. Scrape and 
0 Alzheimer's disease) and certain viral and some bacterial and protozoa 
infections. Recombinant heparanase can be used to neutralize plasma 
heparin, as a potential replacement of protamine. 

As used herein, the term "modulate" includes substantially inhibiting, 
slowing or reversing the progression of a disease, substantially ameliorating 
5 clinical symptoms of a disease or condition, or substantially preventing the 
appearance of clinical symptoms of a disease or condition. A "modulator" 
therefore includes an agent which may modulate a disease or condition. 
Modulation of viral, protozoa and bacterial infections includes any effect 
which substantially interrupts, prevents or reduces any viral, bacterial or 
protozoa activity and/or stage of the virus, bacterium or protozoon life 
cycle, or which reduces or prevents infection by the virus, bacterium or 
protozoon in a subject, such as a human or lower animal. 

As used herein, the term "wound" includes any injury to any portion 
of the body of a subject including, but not limited to, acute conditions such 
as thermal bums, chemical bums, radiation bums, bums caused by excess 
exposure to ultraviolet radiation such as sunburn, damage to bodily tissues 
such as the perineum as a result of labor and childbirth, including injuries 
sustained during medical procedures such as episiotomies, trauma-induced 
injuries including cuts, those injuries sustained in automobile and other 
mechanical accidents, and those caused by bullets, knives and other 
weapons, and post-surgical injuries, as well as chronic conditions such as 
pressure sores, bedsores, conditions related to diabetes and poor circulation, 
and all types of acne, etc. 

Anti-heparanase antibodies, raised against the recombinant enzyme, 
would be useful for immunodetection and diagnosis of micrometastases, 
autoimmune lesions and renal failure in biopsy specimens, plasma samples, 
and body fluids. Such antibodies may also serve as neutralizing agents for 
heparanase activity. 
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The genomic heparanase sequences described herein can be used to 
construct knock-in and knock-out constructs. Such constructs include a 
fragment of 10-20 Kb of a heparanase locus and a negative and a positive 
selection markers and can be used to provide heparanase knock-in and. 
5 knock-out animal models by methods known to the skilled artisan. Such 
animal models can be used for studying the function of heparanase in 
developmental processes, and in normal as well as pathological processes. 
They can also serve as an experimental model for testing drugs and gene 
therapy protocols. The complementary heparanase sequence (cDNA) can 
be used to derive transgenic animals, overexpressing heparanase for same. 
Altematively , if cloned in the antisense orientation, the complementary 
heparanase sequence (cDNA) can be used to derive transgenic animals 
under-expressing heparanase for same. 

The heparanase promoter sequences described herein and other cis 
regulatory elements linked to the heparanase locus can be used to regulated 
the expression of genes. For example, these promoters can be used to 
direct the expression of a cytotoxic protein, such as TNF, in tumor cells. It 
will be appreciated that heparanase itself is abnormally expressed under the 
control of its own promoter and other cis acting elements in a variety of 
tumors, and its expression is correlated with metastasis. It is also 
abnormally highly expressed in inflammatory cells. The introns of the 
heparanase gene can be used for the same purpose, as it is known that 
introns, especially upstream introns include cis acting element which affect 
expression. A heparanase promoter fused to a reporter protein can be used 
to study/monitor its activity. 

The polynucleotide sequences described herein can also be used to 
provide DNA vaccines which will elicit in vivo anti heparanase antibodies. 
Such vaccines can therefore be used to combat inflammatory and cancer. 

Antisense oligonucleotides derived according to the heparanase 
sequences described herein, especially such oligonucleotides supplemented 
with ribozyme activity, can be used to modulate heparanase expression. 
Such oligonucleotides can be from the coding region, from the introns or 
promoter specific. Antisense heparanase nucleic acid constructs can 
similarly function, as well known in the art. 

The heparanase sequences described herein can be used to study the 
catalytic mechanism of heparanase. Carefully selected site directed 
mutagenesis can be employed to provide modified heparanase proteins 
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having modified characteristics in terms of, for example, substrate 
specificity, sensitivity to inhibitors, etc. 

While studying heparanase expression in a variety of cell types 
alternatively spliced transcripts were identified. Such transcripts if found 
5 characteristic of certain pathological conditions can be used as markers for 
such conditions. Such transcripts are expected to direct the synthesis of 
heparanases with altered functions. 

Additional objects, advantages, and novel features of the present 
invention will become apparent to one ordinarily skilled in the art upon 
examination of the following examples, which are not intended to be 
limiting. Additionally, each of the various embodiments and aspects of the 
present invention as delineated hereinabove and as claimed in the claims 
section below finds experimental support in the following examples. 

EXAMPLES 

Generally, the nomenclature used herein and the laboratory 
procedures in recombinant DNA technology described below are those well 
known and commonly employed in the art. Standard techniques are used for 
cloning, DNA and RNA isolation, amplification and purification. Generally 
enzymatic reactions involving DNA ligase, DNA polymerase, restriction 
endonucleases and the like are performed according to the manufacturers' 
specifications. These techniques and various other techniques are generally 
performed according to Sambrook et al., Molecular Cloning--A Laboratory 
Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989), 
which is incorporated herein by reference. Other general references are 
provided throughout this document. The procedures therein are believed to 
be well knovm in the art and are provided for the convenience of the reader. 
All the information contained therein is incorporated herein by reference. 

The following protocols and experimental details are referenced in 
the Examples that follow: 

Purification and characterization of heparanase fi-om a human 
hepatoma cell line and human placenta: A human hepatoma cell line (Sk- 
hep-1) was chosen as a source for purification of a human tumor-derived 
heparanase. Purification was essentially as described in U.S. Pat. No. 
5,362,641 to Fuks, which is incorporated by reference as if ftilly set forth 
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herein. Briefly, 500 liter, 5x10^* cells were grown in suspension and the 
heparanase enzyme was purified about 240,000 fold by applying the 
following steps: (i) cation exchange (CM-Sephadex) chromatography 
performed at pH 6.0, 0.3-1.4 M NaCl gradient; (ii) cation exchange (CM- 
5 Sephadex) chromatography performed at pH 7.4 in the presence of 0.1% 
CHAPS, 0.3-1.1 M NaCl gradient; (iii) heparin-Sepharose chromatography 
performed at pH 7.4 in the presence of 0.1% CHAPS, 0.35-1.1 M NaCl 
gradient; (iv) ConA-Sepharose chromatography performed at pH 6.0 in 
buffer containing 0.1 % CHAPS and 1 M NaCl, elution with 0.25 M a- 
10 methyl mannoside; and (v) HPLC cation exchange (Mono-S) 
chromatography performed at pH 7.4 in the presence of 0.1 % CHAPS, 
0.25-1 M NaCl gradient. 

Active fractions were pooled, precipitated with TCA and the 
precipitate subjected to SDS polyacrylamide gel electrophoresis and/or 
15 tryptic digestion and reverse phase HPLC. Tryptic peptides of the purified 
protein were separated by reverse phase HPLC (C8 column) and 
homogeneous peaks were subjected to amino acid sequence analysis. 

The purified enzyme was applied to reverse phase HPLC and 
subjected to N-terminal amino acid sequencing using the amino acid 
20 sequencer (Applied Biosystems). 

Cells: Cultures of bovine corneal endothelial cells (BCECs) were 
established fi-om steer eyes as previously described (19, 38). Stock cultures 
were maintained in DMEM (1 g glucose/liter) supplemented with 10 % 
newborn calf serum and 5 % PCS. bFGF (1 ng/ml) was added every other 
25 day during the phase of active cell growth (13, 14). 

Preparation of dishes coated with ECM: BCECs (second to fifth 
passage) were plated into 4-well plates at an initial density of 2 x 10^ 
cells/ml, and cultured in sulfate-fi-ee Fisher medium plus 5 % dextran T-40 
for 12 days. Na2^^S04 (25 ^iCi/ml) was added on day 1 and 5 after seeding 
30 and the cultures were incubated with the label without medium change. The 
subendothelial ECM was exposed by dissolving (5 min., room temperature) 
the cell layer with PBS containing 0.5 % Triton X-100 and 20 mM NH4OH, 
followed by four washes with PBS. The ECM remained intact, free of 
cellular debris and firmly attached to the entire area of the tissue culture 
35 dish (19, 22). 

To prepare soluble sulfate labeled proteoglycans (peak I material), 
the ECM was digested with trypsin (25 ^ig/ml, 6 h, 37 ^'C ), the digest was 
concentrated by reverse dialysis and the concentrated material was applied 
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onto a Sepharose 6B gel filtration column. The resulting high molecular 
weight material (Kav< 0.2, peak I) was collected. More than 80 % of the 
labeled material was shown to be composed of heparan sulfate 
proteoglycans (11, 39). 
5 Heparanase activity: Cells (1 x 106/35-mm dish), cell lysates or 

conditioned media were incubated on top of 35S-labeled ECM (18 h, 37 °C) 
in the presence of 20 mM phosphate buffer (pH 6.2). Cell lysates and 
conditioned media were also incubated with sulfate labeled peak I material 
(10-20 fil). The incubation medium was collected, centrifuged (18,000 x g, 
10 4 °C, 3 min.), and sulfate labeled material analyzed by gel filtration on a 
Sepharose CL-6B column (0.9 x 30 cm). Fractions (0.2 ml) were eluted 
with PBS at a flow rate of 5 ml/h and counted for radioactivity using Bio- 
fluor scintillation fluid. The excluded volume (Vq) was marked by blue 
dextran and the total included volume (Vt) by phenol red. The latter was 
15 shown to comigrate with free sulfate (7, 1 1, 23). Degradation fragments of 
HS side chains were eluted from Sepharose 63 at 0.5 < Kav < 0.8 (peak II) 
(7, 1 1, 23). A nearly intact HSPG released from ECM by trypsin - and, to a 
lower extent, during incubation with PBS alone - was eluted next to Vq 
(Kav < 0.2, peak I). Recoveries of labeled material applied on the columns 
ranged from 85 to 95 % in different experiments (11). Each experiment was 
performed at least three times and the variation of elution positions (Kav 
values) did not exceed +/- 15 %. 

Cloning of hpa cDNA: cDNA clones 257548 and 260138 were 
obtained fi-om the I.M.A.G.E Consortium (2130 Memorial Parkway SW, 
Hunstville, AL 35801). The cDNAs were originally cloned in Ecom and 
Notl cloning sites in the plasmid vector pT3T7D-Pac. Although these 
clones are reported to be somewhat different, DNA sequencing 
demonstrated that these clones are identical to one another. Marathon 
RACE (rapid amplification of cDNA ends) human placenta (poly-A) cDNA 
composite was a gift of Prof. Yossi Shiloh of Tel Aviv University. This 
composite is vector free, as it includes reverse transcribed cDNA fragments 
to which double, partially single stranded adapters are attached on both 
sides. The construction of the specific composite employed is described in 
reference 39a. 

Amplification of hp3 PCR fragment was performed according to the 
protocol provided by Clontech laboratories. The template used for 
amplification was a sample taken fi^om the above composite. The primers 
used for amplification were: 
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First step: 5*-primer: API: S'-CCATCCTAATACGACTCACT 
ATAGGGC-3*, SEQIDNO:!; 3'-primer: HPL229: 5'-GTAGTGATGCCA 
TGTAACTGAATC-3', SEQ ID NO:2. 

Second step: nested 5'-primer: AP2: 5'-ACTCACTATAGGGCTCG 
5 AGCGGC-3', SEQ ID NO:3; nested 3*- primer: HPL171: 5*- 
GCATCTTAGCCGTCTTTCTTCG-3', SEQ ID NO:4. The HPL229 and 
HPL171 were selected according to the sequence of the EST clones. They 
include nucleotides 933-956 and 876-897 of SEQ ID NO:9, respectively. 

PGR program was 94 - 4 min., followed by 30 cycles of 94 ""C - 
10 40 sec, 62 °C - 1 min., 72 °C - 2.5 min. Amplification was performed with 
Expand High Fidelity (Boehringer Mannheim). The resulting ca. 900 bp 
hp3 PGR product was digested with Bfrl and PvulL Clone 257548 (phpal) 
was digested with Ecom, followed by end filling and was then further 
digested with Bfrl. Thereafter the Pvull - Bfrl fi-agment of the hp3 PGR 
15 product was cloned into the blunt end - Bfrl end of clone phpal which 
resulted in having the entire cDNA cloned in pT3T7-pac vector, designated 
phpal. 

RT'PCR: RNA was prepared using TRI-Reagent (Molecular 
research center Inc.) according to the manufacturer instructions, 1.25 (ig 
20 were taken for reverse transcription reaction using MuMLV Reverse 
transcriptase (Gibco BRL) and Oligo (dT)i5 primer, SEQ ID NO: 5, 
(Promega). Amplification of the resultant first strand cDNA was 
performed with Tag polymerase (Promega). The following primers were 
used: 

25 HPU-355: 5»-TTGGATGGGAAGAAGGAATGAAG-3\ SEQ ID NO:6, 
nucleotides 372-394 in SEQ ID NOs:9 or 11 . 

HPL-229: 5'-GTAGTGATGCCATGTAAGTGAATC-3', SEQ ID NO:7, 

nucleotides 933-956 in SEQ ID NOs:9 or II. 

PGR program: 94 **G - 4 min., followed by 30 cycles of 94 ""C - 40 
30 sec, 62 °G - 1 min., 72 °G - 1 min. 

Alternatively, total RNA was prepared firom cell cultures using Tri- 

reagent (Molecular Research Center, Inc.) according to the manufacturer 

recommendation. Poly AH- RNA was isolated from total RNA using mRNA 

separator (Glontech). Reverse transcription was performed with total RNA 
35 using Superscript II (GibcoBRL). PGR was performed with Expand high 

fidelity (Boehringer Mannheim). Primers used for amplification were as 

follows: 

Hpu-685, 5'-GAGGAGGGAGGTGAGGGGAAGAT-3', SEQ ID NO:24 
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Hpu-355, 5'-TTCGATCCCAAGAAGGAATCAAC-3', SEQ ID NO:25 
Hpu 565, 5'-AGCTCTGTAGATGTGCTATACAC-3\ SEQ ID NO:26 
Hpl 967, 5'-TCAGATGCAAGCAGCAACTTTGGC-3\ SEQ ID NO:27 
Hpl 171, 5'-GCATCTTAGCCGTCTTTCTTCG-3', SEQIDNO:28 
5 Hpl 229, 5'-GTAGTGATGCCATGTAACTGAATC-3', SEQ ID NO:29 

PGR reaction was performed as follows: 94 °C 3 minutes, followed 
by 32 cycles of 94 °C 40 seconds, 64 1 minute, 72 °C 3 minutes, and one 
cycle 72 °C, 7 minutes. 

Expression of recombinant heparanase in insect cells: Cells, High 
Five and SGl insect cell lines were maintained as monolayer cultures in 
SF900II-SFM medium (GibcoBRL). 

Recombinant Baculovirus: Recombinant virus containing the hpa 
gene was constructed using the Bac to Bac system (GibcoBRL), The 
transfer vector pFastBac was digested with Sail and Notl and ligated with a 
1.7 kb fragment of phpal digested with Xhol and Notl. The resulting 
plasmid was designated pFastA/7a2. An identical plasmid designated 
pFastApa4 was prepared as a duplicate and both independently served for 
further experimentations. Recombinant bacmid was generated according to 
the instructions of the manufacturer with pFastA;7a2, pFastApa4 and with 
pFastBac. The latter served as a negative control. Recombinant bacmid 
DNAs were transfected into Sf21 insect cells. Five days after transfection 
recombinant viruses were harvested and used to infect High Five insect 
cells, 3 X 10^ cells in T-25 flasks; Cells were harvested 2 - 3 days after 
infection. 4 x 10^ cells were centrifiiged and resuspended in a reaction 
buffer containing 20 mM phosphate citrate buffer, 50 mM NaCl. Cells 
underwent three cycles of freeze and thaw and lysates were stored at -80 
°C. Conditioned medium was stored at 4 °C. 

Partial purification of recombinant heparanase: Partial 
purification of recombinant heparanase was performed by heparin- 
Sepharose column chromatography followed by Superdex 75 column gel 
filtration. Culture medium (150 ml) of SfZl cells infected with pFhpa4 
virus was subjected to heparin-Sepharose chromatography. Elution of 1 ml 
fractions was performed with 0.35 - 2 M NaCl gradient in presence of 0.1 % 
CHAPS and 1 mM DTT in 10 mM sodium acetate buffer, pH 5.0. A 25 ^1 
sample of each fraction was tested for heparanase activity. Heparanase 
activity was eluted at the range of 0.65 - 1.1 M NaCl (fractions 18-26, 
Figure 10a). 5 ^l of each fraction was subjected to 15 % SDS- 
polyacrylamide gel electrophoresis followed by silver nitrate staining. 
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Active fractions eluted from heparin-Sepharose (Figure 10a) were pooled 
and concentrated (x 6) on YM3 cut-off membrane. 0.5 ml of the 
concentrated material was applied onto 30 ml Superdex 75 FPLC column 
equilibrated with 10 mM sodium acetate buffer, pH 5.0, containing 0.8 M 
5 NaCl, 1 mM DTT and 0.1 % CHAPS. Fractions (0.56 ml) were collected at 
a flow rate of 0.75 ml/min. Aliquots of each fraction were tested for 
heparanase activity and were subjected to SDS-polyacrylamide gel 
electrophoresis followed by silver nitrate staining (Figure 1 lb). 

PCR amplification of genomic DNA: 94 °C 3 minutes, followed by 
0 32 cycles of 94 °C 45 seconds, 64 °C 1 minute, 68 5 minutes, and one 
cycle at 72 ""C, 7 minutes. Primers used for amplification of genomic DNA 
included: 

GHpu-L3 5'-AGGCACCCTAGAGATGTTCCAG-3', SEQ ID NO:30 
GHpl-L6 5 '-GAAGATTTCTGTTTCCATGACGTG-3 SEQ ID NO:3 1 . 

Screening of genomic libraries: A human genomic library in 
Lambda phage EMBLE3 SP6/T7 (Clontech, Paulo Alto, OA) was screened. 
5 X 105 plaques were plated at 5 x 10^ pfu/plate on NZCYM agar/top 
agarose plates. Phages were absorbed on nylon membranes in duplicates 
(Qiagen). Hybridization was performed at 65 °C in 5 x SSC, 5 x Denhart's, 
10 % dextran sulfate, 100 |ag/ml Salmon sperm, 32p labeled probe (106 
cpm/ml). A 1.6 kb fragment, containing the entire hpa cDNA was labeled 
by random priming (Boehringer Mannheim). Following hybridization 
membranes were washed once with 2 x SSC, 0.1 % SDS at 65 °C for 20 
minutes, and twice with 0.2 x SSC, 0.1 % SDS at 65 °C for 15 minutes. 
Hybridizing plaques were picked, and plated at 100 pfu/plate. 
Hybridization was performed as above and single isolated positive plaques 
were picked. 

Phage DNA was extracted using a Lambda DNA extraction kit 
(Qiagen). DNA was digested with Xhol and EcoRl, separated on 0.7 % 
agarose gel and transferred to nylon membrane Hybond N+ (Amersham). 
Hybridization and washes were performed as above. 

cDNA Sequence analysis: Sequence determinations were performed 
with vector specific and gene specific primers, using an automated DNA 
sequencer (Applied Biosystems, model 373A). Each nucleotide was read 
from at least two independent primers. 

Genomic sequence analysis: Large-scale sequencing was performed 
by Commonwealth Biotechnology Incorporation. 
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Isolation of mouse hpa: Mouse hpa cDNA was amplified from 
either Marathon ready cDNA library of mouse embryo or from mRNA 
isolated from mouse melanoma cell line BL6, using the Marathon RACE kit 
from Clontech. Both procedures were performed according to the 
5 manufacturer's recommendation. 

Primers used for PCR amplification of mouse hpa: 
Mhpl773 5'-CCACACTGAATGTAATACTGAAGTG-3\ SEQ ID NO:32 
MHpl736 5'-CGAAGCTCTGGAACTCGGCAAG-3', SEQ ID NO:33 
MHpl83 5'-GCCAGCTGCAAAGGTGTTGGAC-3\ SEQ ID NO:34 
[0 Mhpll52 5'-AACACCTGCCTCATCACGACTTC-3', SEQ ID NO:35 
Mhplll4 5'-GCCAGGCTGGCGTCGATGGTGA-3', SEQ ID NO:36 
MHpll03 5'-GTCGATGGTGATGGACAGGAAC-3', SEQ ID NO:37 
Apl 5'-GTAATACGACTCACTATAGGGC-3', SEQ ID NO:38 - 
(Genome walker) 
5 Ap2 5'-ACTATAGGGCACGCGTGGT-3\ SEQ ID NO:39 - 
(Genome walker) 

Apl 5'-CCATCCTAATACGACTCACTATAGGGC-3', SEQ ID NO:40 - 
(Marathon RACE) 

Ap2 5'-ACTCACTATAGGGCTCGAGCGGC-3', SEQ ID NO:41 - 
(Marathon RACE) 

Southern analysis of genomic DNA: Genomic DNA was extracted 
from animal or from human blood using Blood and cell culture DNA maxi 
kit (Qiagene). DNA was digested with EcoKl, separated by gel 
electrophoresis and transferred to a nylon membrane Hybond N+ 
(Amersham). Hybridization was performed at 68 ""C in 6 x SSC, 1 % SDS, 
5 X Denharts, 10 % dextran sulfate, 100 jig/ml salmon sperm DNA, and 32p 
labeled probe. A 1.6 kb fragment, containing the entire hpa cDNA was 
used as a probe. Following hybridization, the membrane was washed with 3 
X SSC, 0.1 % SDS, at 68 °C and exposed to X-ray film for 3 days. 
Membranes were then washed with 1 x SSC, 0.1 % SDS, at 68 "^C and were 
reexposed for 5 days. 

Construction of hpa promoter-GFP expression vector: Lambda 
DNA of phage L3, was . digested with Sad and Bgtll, resulting in a 1712 bp 
fragment which contained the hpa promoter (877-2688 of SEQ ID NO:42). 
The pEGFP-1 plasmid (Clontech) was digested with Bglll and Sad and 
ligated with the 1712 bp fragment of the hpa promoter sequence. The 
resulting plasmid was designated phpEGL. A second hpa promoter-GFP 
plasmid was constructed containing a shorter fragment of the hpa promoter 
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region: phpEGL was digested with Hindlll, and the resuhing 1095 bp 
fragment (nucleotides 1593-2688 of SEQ ID NO:42) was ligated with 
HindlU digested pEGFP-L The resulting plasmid was designated phpEGS. 
Computer analysis of sequences: Homology searches were 
5 performed using several computer servers, and various databases. Blast 2.0 
service, at the NCBI server was used to screen the protein database swplus 
and DNA databases such as GenBank, EMBL, and the EST databases. 
Blast 2.0 search was performed using the basic search option of the NCBI 
server. Sequence analysis and alignments were done using the DNA 
10 sequence analysis software package developed by the Genetic Computer 
Group (GCG) at the university of Wisconsin. Alignments of two sequences 
were performed using Bestfit (gap creation penalty - 12, gap extension 
penalty - 4). Protein homology search was performed with the Smith- 
Waterman algorithm, using the Bioaccelerator platform developed by 
15 Compugene. The protein database swplus was searched using the following 
parameters: gapop: 10.0, gapext: 0.5, matrix: blosum62. Blocks homology 
was performed using the Blocks WWW server developed at Fred 
Hutchinson Cancer Research Center in Seattle, Washington, USA. 
Secondary structure prediction was performed using the PHD server - 
20 Profile network Prediction Heidelberg. Fold recognition (threading) was 
performed using the UCLA-DOE structure prediction server. The method 
used for prediction was gonnet+predss. Alignment of three sequences was 
performed using the pileup application (gap creation penalty - 5, gap 
extension penalty - 1). Promoter analysis was performed using TSSW and 
25 TSSG programs (BCM Search Launcher Human Genome Center, Baylor 
College of Medicine, Houston TX). 

EXAMPLE 1 
Cloning of human hpa cDNA 

30 Purified fraction of heparanase isolated from human hepatoma cells 

(SK-hep-1) was subjected to tryptic digestion and microsequencing. EST 
(Expressed Sequence Tag) databases were screened for homology to the 
back translated DNA sequences corresponding to the obtained peptides. 
Two EST sequences (accession Nos. N41349 and N45367) contained a 

35 DNA sequence encoding the peptide YGPDVGQPR (SEQ ID NO:8). 
These two sequences were derived from clones 257548 and 260138 
(LM.A.G.E Consortium) prepared from 8 to 9 weeks placenta cDNA library 
(Soares). Both clones which were found to be identical contained an insert 
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of 1020 bp which included an open reading frame (ORF) of 973 bp 
followed by a 3' untranslated region of 27 bp and a Poly A tail. No 
translation start site (AUG) was identified at the 5' end of these clones. 

Cloning of the missing 5' end was performed by PCR amplification 
5 of DNA from a placenta Marathon RACE cDNA composite. A 900 bp 
fragment (designated hp3), partially overlapping with the identified 3* 
encoding EST clones was obtained. 

The joined cDNA fragment, 1721 bp long (SEQ ID NO:9), contained 
an open reading frame which encodes, as shown in Figure 1 and SEQ ID 
10 NO: 11, a polypeptide of 543 amino acids (SEQ ID NO: 10) with a 
calculated molecular weight of 61,192 daltons. The 3' end of the partial 
cDNA inserts contained in clones 257548 and 260138 started at nucleotide 
G721 of SEQ ID NO:9 and Figure 1. 

As fiirther shown in Figure 1, there was a single sequence 

15 discrepancy between the EST clones and the PCR amplified sequence, 
which led to an amino acid substitution from Tyr246 in the EST to Phe246 
the amplified cDNA. The nucleotide sequence of the PCR amplified cDNA 
fragment was verified from two independent amplification products. The 
new gene was designated hpa. 

20 As stated above, the 3' end of the partial cDNA inserts contained in 

EST clones 257548 and 260138 started at nucleotide 721 of hpa (SEQ ID 
NO:9). The ability of the hpa cDNA to form stable secondary structures, 
such as stem and loop structures involving nucleotide stretches in the 
vicinity of position 721 was investigated using computer modeling. It was 

25 found that stable stem and loop structures are likely to be formed involving 
nucleotides 698-724 (SEQ ID NO:9). In addition, a high GC content, up to 
70 %, characterizes the 5' end region of the hpa gene, as compared to about 
only 40 % in the 3' region. These findings may explain the immature 
termination and therefore lack of 5' ends in the EST clones. 

30 To examine the ability of the hpa gene product to catalyze 

degradation of heparan sulfate in an in vitro assay the entire open reading 
frame was expressed in insect cells, using the Baculovirus expression 
system. Extracts of cells, infected with virus containing the hpa gene, 
demonstrated a high level of heparan sulfate degradation activity, while 

35 cells infected with a similar construct containing no hpa gene had no such 
activity, nor did non-infected cells. These results are fiirther demonstrated 
in the following Examples. 
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EXAMPLE 2 
Degradation of soluble ECM-derived HSPG 

Monolayer cultures of High Five cells were infected (72 h, 28 °C) 
with recombinant Bacoluvirus containing the pFastApa plasmid or with 
5 control virus containing an insert free plasmid. The cells were harvested 
and lysed in heparanase reaction buffer by three cycles of freezing and 
thawing. The cell lysates were then incubated (18 h, 37 °C) with sulfate 
labeled, ECM-derived HSPG (peak I), followed by gel filtration analysis 
(Sepharose 6B) of the reaction mixture. 
10 As shown in Figure 2, the substrate alone included almost entirely 

high molecular weight (Mr) material eluted next to Vq (peak I, fractions 5- 
20, Kav < 0.35). A similar elution pattern was obtained when the HSPG 
substrate was incubated with lysates of cells that were infected with control 
virus. In contrast, incubation of the HSPG substrate with lysates of cells 
5 infected with the hpa containing virus resulted in a complete conversion of 
the high Mr substrate into low Mr labeled degradation fragments (peak II, 
fractions 22-35, 0.5 < Kav < 0.75). 

Fragments eluted in peak II were shown to be degradation products 
of heparan sulfate, as they were (i) 5- to 6- fold smaller than intact heparan 
sulfate side chains (Kav approx. 0.33) released from ECM by treatment 
with either alkaline borohydride or papain; and (ii) resistant to ftirther 
digestion with papain or chondroitinase ABC, and susceptible to 
deamination by nitrous acid (6, 11). Similar results (not shown) were 
obtained with Sf21 cells. Again, heparanase activity was detected in cells 
infected with the hpa containing virus (pFA/?a), but not with control virus 
(pF). This result was obtained with two independently generated 
recombinant viruses. Lysates of control not infected High Five cells failed 
to degrade the HSPG substrate. 

In subsequent experiments, the labeled HSPG substrate was 
incubated with medium conditioned by infected High Five or SfZl cells. 

As shown in Figures 3a-b, heparanase activity, reflected by the 
conversion of the high Mr peak I substrate into the low Mr peak II which 
represents HS degradation fragments, was found in the culture medium of 
cells infected with the pFhpa2 or pPhpaA viruses, but not with the control 
pFl or pF2 viruses. No heparanase activity was detected in the culture 
medium of control non-infected High Five or Sf21 cells. 

The medium of cells infected with the p¥hpa4 virus was passed 
through a 50 kDa cut off membrane to obtain a crude estimation of the 
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molecular weight of the recombinant heparanase enzyme. As demonstrated 
in Figure 4, all the enzymatic activity was retained in the upper 
compartment and there was no activity in the flow through (<50 kJDa) 
material. This result is consistent with the expected molecular weight of the 
5 hpa gene product. 

In order to further characterize the hpa product the inhibitory effect 
of heparin, a potent inhibitor of heparanase mediated HS degradation (40) 
was examined. 

As demonstrated in Figures 5a-b, conversion of the peak I substrate 
10 into peak II HS degradation fragments was completely abolished in the 
presence of heparin. 

Altogether, these results indicate that the heparanase enzyme is 
expressed in an active form by insect cells infected with Baculovirus 
containing the newly identified human hpa gene. 

15 

EXAMPLE 3 
Degradation ofHSPG in intact ECM 

Next, the ability of intact infected insect cells to degrade HS in 
intact, naturally produced ECM was investigated. For this purpose, High 

20 Five or SfZl cells were seeded on metabolically sulfate labeled ECM 
followed by infection (48 h, 28 ''C) with either the pFhpa4 or control pF2 
viruses. The pH of the medium was then adjusted to pH 6.2-6.4 and the 
cells further incubated with the labeled ECM for another 48 h at 28 °C or 24 
h at 37 °C. Sulfate labeled material released into the incubation medium 

25 was analyzed by gel filtration on Sepharose 6B. 

As shown in Figures 6a-b and 7a-b, incubation of the ECM with cells 
infected with the control pF2 virus resulted in a constant release of labeled 
material that consisted almost entirely (>90%) of high Mr firagments (peak 
I) eluted with or next to Vq. It was previously shown that a proteolytic 

30 activity residing in the ECM itself and/or expressed by cells is responsible 
for release of the high Mr material (6). This nearly intact HSPG provides a 
soluble substrate for subsequent degradation by heparanase, as also 
indicated by the relatively large amount of peak I material accumulating 
when the heparanase enzyme is inhibited by heparin (6, 7, 12, Figure 9). 

35 On the other hand, incubation of the labeled ECM with cells infected with 
the pFhpaA virus resulted in release of 60-70% of the ECM-associated 
radioactivity in the form of low Mr sul fate-labeled fragments (peak II, 0,5 
<Kav< 0.75), regardless of whether the infected cells were incubated with 
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the ECM at 28 °C or 37 °C. Control intact non-infected Sf21 or High Five 
cells failed to degrade the ECM HS side chains. 

In subsequent experiments, as demonstrated in Figures 8a-b, High 
Five and Sf21 cells were infected (96 h, 28 °C) with p¥hpa4 or control pFl 
5 viruses and the culture medium incubated with sulfate-labeled ECM. Low 
Mr HS degradation fragments were released from the ECM only upon 
incubation with medium conditioned by pFhpaA infected cells. As shown in 
Figure 9, production of these fragments was abolished in the presence of 
heparin. No heparanase activity was detected in the culture medium of 
10 control, non-infected cells. These results indicate that the heparanase 
enzyme expressed by cells infected with the pFhpa4 virus is capable of 
degrading HS when complexed to other macromolecular constituents (i.e. 
fibronectin, laminin, collagen) of a naturally produced intact ECM, in a 
manner similar to that reported for highly metastatic tumor cells or activated 
cells of the immune system (6, 7). 

EXAMPLE 4 
Purification of recombinant human heparanase 

The recombinant heparanase was partially purified from medium of 
pFhpaA infected Sf2 1 cells by Heparin-Sepharose chromatography (Figure 
10a) followed by gel filtration of the pooled active fractions over an FPLC 
Superdex 75 column (Figure 11a). A - 63 kDa protein was observed, 
whose quantity, as was detected by silver stained SDS-polyacrylamide gel 
electrophoresis, correlated with heparanase activity in the relevant column 
fractions (Figures 10b and 1 lb, respectively). This protein was not detected 
in the culture medium of ce:lls infected with the control pFl virus and was 
subjected to a similar fractionation on heparin- Sepharose (not shown). 

EXAMPLE 5 

Expression of the human hpa cDNA in various cell types, organs and 

tissues 

Referring now to Figures 12a-e, RT-PCR was applied to evaluate the 
expression of the hpa gene by various cell types and tissues. For this 
purpose, total RNA was reverse transcribed and amplified. The expected 
585 bp long cDNA was clearly demonstrated in human kidney, placenta (8 
and 11 weeks) and mole tissues, as well as in freshly isolated and short 
termed (1.5-48 h) cultured human placental cytotrophoblastic cells (Figure 
12a), all known to express a high heparanase activity (41). The hpa 



wo 00/52178 



PCT/USOO/03542 



transcript was also expressed by normal human neutrophils (Figure 12b). In 
contrast, there was no detectable expression of the hpa mRNA in embryonic 
human muscle tissue, thymus, heart and adrenal (Figure 12b). The hpa 
gene was expressed by several, but not all, human bladder carcinoma cell 
5 lines (Figure 12c), SK hepatoma (SK-hep-1), ovarian carcinoma (OV 1063), 
breast carcinoma (435, 231), melanoma and megakaryocytic (DAMI, 
CHRF) human cell lines (Figures 12d-e). 

The above described expression pattern of the hpa transcript was 
determined to be in a very good correlation with heparanase activity levels 
10 determined in various tissues and cell types (not shown). 

EXAMPLE 6 

Isolation of an extended 5 ' end of hpa cDNA from human SK-hepl cell 

line 

15 The 5* end of hpa cDNA was isolated from human SK-hepl cell line 

by PGR amplification using the Marathon RACE (rapid amplification of 
cDNA ends) kit (Clontech). Total RNA was prepared from SK-hepl cells 
using the TRI-Reagent (Molecular research center Inc.) according to the 
manufacturer instructions. Poly A+ RNA was isolated using the mRNA 

20 separator kit (Clonetech). 

The Marahton RACE SK-hepl cDNA composite was constructed 
according to the manufacturer recommendations. First round of 
amplification was performed using an adaptor specific primer API: 5'- 
CCATCCTAATACG ACTCACTATAGGGC-3*, SEQ ID NO:l, and a hpa 

25 specific antisense primer hpl-629: 5'-CCCCAGGAGCAGCAGCATCAG- 
3', SEQ ID NO:17, corresponding to nucleotides 119-99 of SEQ ID NO:9. 
The resulting PCR product was subjected to a second round of amplification 
using an adaptor specific nested primer AP2: 5'- 
ACTCACTATAGGGCTCGAGCGGC-3', SEQ ID NO:3, and a hpa 

30 specific antisense nested primer hpl-666 5'- 

AGGCTTCGAGCGCAGCAGCAT-3', SEQ ID NO: 18, corresponding to 
nucleotides 83-63 of SEQ ID NO:9. The PCR program was as follows: a 
hot start of 94 ^'C for 1 minute, followed by 30 cycles of 90 °C - 30 seconds, 
68 °C - 4 minutes. The resulting 300 bp DNA fragment was extracted from 

35 an agarose gel and cloned into the vector pGEM-T Easy (Promega). The 
resulting recombinant plasmid was designated pHPSKl . 

The nucleotide sequence of the pHPSKI insert was determined and it 
was found to contain 62 nucleotides of the 5' end of the placenta hpa cDNA 
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(SEQ ID NO:9) and additional 178 nucleotides upstream, the first 178 
nucleotides of SEQ ID NOs:13 and 15. 

A single nucleotide discrepancy was identified between the SK-hepl 
cDNA and the placenta cDNA. The "T" derivative at position 9 of the 
5 placenta cDNA (SEQ ID NO:9), is replaced by a "C" derivative at the 
corresponding position 187 of the SK-hepl cDNA (SEQ ID NO: 13). 

The discrepancy is likely to be due to a mutation at the 5' end of the 
placenta cDNA clone as confirmed by sequence analysis of sevsral 
additional cDNA clones isolated from placenta, which like the SK-hepl 
10 cDNA contained C at position 9 of SEQ ID NO: 9. 

The 5' extended sequence of the SK-hepl hpa cDNA was assembled 
with the sequence of the hpa cDNA isolated from human placenta (SEQ ID 
NO:9). The assembled sequence contained an open reading frame which 
encodes, as shown in SEQ ID NOs:14 and 15, a polypeptide of 592 amino 
5 acids with a calculated molecular weight of 66,407 daltons. The open 
reading frame is flanked by 93 bp 5' untranslated region (UTR). 

EXAMPLE 7 

Isolation of the upstream genomic region of the hpa gene 
The upstream region of the hpa gene was isolated using the Genome 
Walker kit (Clontech) according to the manufacturer recommendations. 
The kit includes five human genomic DNA samples each digested with a 
different restriction endonuclease creating blunt ends: EcoKY, Seal, Dral, 
PvuU and Sspl. 

The blunt ended DNA fragments are ligated to partially single 
stranded adaptors. The Genomic DNA samples were subjected to PGR 
amplification using the adaptor specific primer and a gene specific primer. 
Amplification was performed with Expand High Fidelity (Boehringer 
Mannheim). 

A first round of amplification was performed using the apl primer: 
5*-G TAATACGACTCACTATAGGGC-3', SEQ ID NO: 19, and the hpa 
specific antisense primer hpl-666: 5'-AGGCTTCGAGCGCAGCAGCAT- 
3', SEQ ID NO: 18, corresponding to nucleotides 83 - 63 of SEQ ID NO:9. 
The PGR program was as follows: a hot start of 94 ""C - 3 minutes, followed 
by 36 cycles of 94 °G - 40 seconds, 67 °C - 4 minutes. 

The PGR products of the first amplification were diluted 1:50. One 
[il of the diluted sample was used as a template for a second amplification 
using a nested adaptor specific primer ap2; 5'- 
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ACTATAGGGCACGCGTGGT-3', SEQ ID NO:20, and a hpa specific 
antisense primer hpl-690, 5'-CTTGGGCTCACC TGGCTGCTC-3', SEQ ID 
NO:21, corresponding to nucleotides 62-42 of SEQ ID NO:9. The resulting 
amplification products were analyzed using agarose gel electrophoresis. 
5 Five different PGR products were obtained from the five amplification 
reactions. A DNA fragment of approximately 750 bp which was obtained 
from the Sspl digested DNA sample was gel extracted. The purified 
fragment was ligated into the plasmid vector pGEM-T Easy (Promega). 
The resulting recombinant plasmid was designated pGHP6905 and the 

10 nucleotide sequence of the hpa insert was determined, 

A partial sequence of 594 nucleotides is shown in SEQ ID NO: 16. 
The last nucleotide in SEQ ID NO: 13 corresponds to nucleotide 93 in SEQ 
ID: 13. The DNA sequence in SEQ ID NO: 16 contains the 5' region of the 
hpa cDNA and 501 nucleotides of the genomic upstream region which are 

15 predicted to contain the promoter region of the hpa gene, 

EXAMPLE 8 

Expression of the 592 amino acids HPA polypeptide in a human 293 cell 

line 

20 The 592 amino acids open reading frame (SEQ ID NOs:13 and 15) 

was constructed by ligation of the 1 10 bp corresponding to the 5' end of the 
SK-hepl hpa cDNA with the placenta cDNA. More specifically the 
Marathon RACE - PGR amplification product of the placenta hpa DNA was 
digested with Sad and an approximately 1 kb fragment was ligated into a 

25 5acl-digested pGHP6905 plasmid. The resulting plasmid was digested with 
Earl and Aatll. The Earl sticky ends were blunted and an approximately 
280 bp Ear\/h\\xnX-Aatll fragment was isolated. This fragment was ligated 
with pFastApa digested with EcoKi which was blunt ended using Klenow 
fragment and ftirther digested with Aatll, The resulting plasmid contained a 

30 1827 bp insert which includes an open reading frame of 1776 bp, 31 bp of 
3' UTR and 21 bp of 5' UTR. This plasmid was designated pFastL/i/?^. 

A mammalian expression vector was constructed to drive the 
expression of the 592 amino acids heparanase polypeptide in human cells. 
The hpa cDNA was excised prom pFastLApa with ^^^HII and Notl. The 

35 resulting 1850 bp BssUll-Notl fragment was ligated to a mammalian 
expression vector pSI (Promega) digested with Mlul and Notl, The 
resulting recombinant plasmid, pSlA/7aMet2 was transfected into a human 
293 embryonic kidney cell line. 
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Transient expression of the 592 amino-acids heparanase was 
examined by western blot analysis and the enzymatic activity was tested 
using the gel shift assay. Both these procedures are described in length in 
U.S. Pat. application No. 09/071,739, filed May 1, 1998, which is 
5 incorporated by reference as if fully set forth herein. Cells were harvested 3 
days following transfection. Harvested cells were re-suspended in lysis 
buffer containing 150 mM NaCl, 50 mM Tris pH 7.5, 1% Triton X-100, I 
mM PMSF and protease inhibitor cocktail (Boehringer Mannheim). 40 ^g 
protein extract samples were used for separation on a SDS-PAGE. Proteins 
10 were transferred onto a PVDF Hybond-P membrane (Amersham). The 
membrane was incubated with an affinity purified polyclonal anti 
heparanase antibody, as described in U.S. Pat. application No. 09/071,739. 
A major band of approximately 50 kDa was observed in the transfected 
cells as well as a minor band of approximately 65 kDa. A similar pattern 
15 was observed in extracts of cells transfected with the pShpa as 
demonstrated in U.S. Pat. application No. 09/071,739. These two bands 
probably represent two forms of the recombinant heparanase protein 
produced by the transfected cells. The 65 kDa protein probably represents a 
heparanase precursor, while the 50 kDa protein is suggested herein to be the 
processed or mature form. 

The catalytic activity of the recombinant protein expressed in the 
pShpaMctl transfected cells was tested by gel shift assay. Cell extracts of 
transfected and of mock transfected cells were incubated overnight with 
heparin (6 ^g in each reaction) at 37 °C, in the presence of 20 mM 
phosphate citrate buffer pH 5.4, 1 mM CaCl2, 1 mM DTT and 50 mM 
NaCl. Reaction mixtures were then separated on a 10 % polyacrylamide 
gel. The catalytic activity of the recombinant heparanase was clearly 
demonstrated by a faster migration of the heparin molecules incubated with 
the transfected cell extract as compared to the control. Faster migration 
indicates the disappearance of high molecular weight heparin molecules and 
the generation of low molecular weight degradation products. 

EXAMPLE 9 
Chromosomal localization of the hpa gene 
Chromosomal mapping of the hpa gene was performed utilizing a 
panel of monochromosomal human/CHO and human/mouse somatic cell 
hybrids, obtained from the UK HGMP Resource Center (Cambridge, 
England). 
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40 ng of each of the somatic cell hybrid DNA samples were 
subjected to PCR amplification using the hpa primers: hpu565 5- 
AGCTCTGTAGATGTGC TATACAC-3', SEQ ID NO:22, corresponding 
to nucleotides 564-586 of SEQ ID NO:9 and an antisense primer hpll71 5'- 
5 GCATCTTAGCCGTCTTTCTTCG-3\ SEQ ID NO:23, corresponding to 
nucleotides 897-876 of SEQ ID NO:9. 

The PCR program was as follows: a hot start of 94 ""C - 3 minutes, 
followed by 7 cycles of 94 °C - 45 seconds, 66 °C - 1 minute, 68 °C - 5 
minutes, followed by 30 cycles of 94 °C - 45 seconds, 62 °C - 1 minute, 68 
10 °C - 5 minutes, and a 10 minutes final extension at 72 ^'C. 

The reactions were performed with Expand long PCR (Boehringer 
Mannheim). The resulting amplification products were analyzed using 
agarose gel electrophoresis. As demonstrated in Figure 14, a single band of 
approximately 2.8 Kb was obtained firom chromosome 4, as well as from 
15 the control human genomic DNA. A 2.8 kb amplification product is 
expected based on amplification of the genomic hpa clone (data not shown). 
No amplification products were obtained neither in the control DNA 
samples of hamster and mouse nor in somatic hybrids of other human 
chromosome. 

20 

EXAMPLE 10 
Human genomic clone encoding heparanase 
Five plaques were isolated following screening of a human genomic 
hbrary and were designated L3-1, L5-1, L8-1, LlO-1 and L6-1. The phage 
25 DNAs were analyzed by Southern hybridization and by PCR with hpa 
specific and vector specific primers. Southern analysis was performed with 
three firagments of hpa cDNA: a Pvull-Bamm ft-agment (nucleotides 32- 
450, SEQ ID NO:9), a BamHl-Ndel fi-agment (nucleotides 451-1102, SEQ 
ID NO:9) and an Ndel-Xhol fragment (nucleotides 1103-1721, SEQ ID 
30 NO:9). 

Following Southern analysis, phages L3, L6, L8 were selected for 
further analysis. A scheme of the genomic region and the relative position 
of the three phage clones is depicted in Figure 15. A 2 kb DNA fi-agment 
containing the gap between phages L6 and L3 was PCR amplified from 
35 human genomic DNA with two gene specific primers GHpuL3 and 
GHpIL6. The PCR product was cloned into the plasmid vector pGEM-T- 
easy (Promega). 
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Large scale DNA sequencing of the three Lambda clones and the 
amplified fragment was performed with Lambda purified DNA by primer 
walking, A nucleotide sequence of 44,898 bp was analyzed (Figure 16, 
SEQ ID NO:42). Comparison of the genomic sequence with that of hpa 
5 cDNA revealed 12 exons separated by 11 introns (Figures 15 an 16). The 
genomic organization of the hpa gene is depicted in Figure 15 (top). The 
sequence include the coding region from the first ATG to the stop codon 
which spans 39,113 nucleotides, 2742 nucleotides upstream of the first 
ATG and 3043 nucleotides downstream of the stop codon. Splice site 
10 consensus sequences were identified at exon/intron junctions. 



EXAMPLE 11 
Alternative splicing 
Several minor RT-PCR products were obtained from various cell 
15 types, following amplification with hpa specific primers. Each one found to 
contain a deletion of one or two exons. Some of these PGR products 
contain ORFs, which encode potential shorter proteins. 

Table 1 below summarizes the alternative spliced products isolated 
from various cell lines. 
20 Fragments of similar sizes were obtained following amplification 

with two cell lines, placenta and platelets. 

Cell type Nucleotides deleted Exons deleted ORF 



25 



30 



Platelets 
Platelets 
Platelets 



1047-1267 
1154-1267 
289-435, 562-735 



Sk-hepl, platelets, Zr75 562-735 
Sk-hepl (hepatoma) 561-904 
Zr75 (breast carcinoma) 96-203 



8,9 
9 

2.4 
4 

4, 5 

1 (partial) 



EXAMPLE 12 
Mouse and rat hpa 
EST databases were screened for sequences homologous to the hpa 
35 gene. Three mouse EST's were identified (accession No. Aal 77901, fi-om 
mouse spleen, Aa067997 from mouse skin, Aa47943 fi-om mouse embryo), 
assembled into a 824 bp cDNA fragment which contains a partial open 
reading frame (lacking a 5' end) of 629 bp and a 3' untranslated region of 
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195 bp (SEQ ID NO: 12). As shown in Figure 13, the coding region is 80 % 
similar to the 3* end of the hpa cDNA sequence. These EST's are probably 
cDNA fragments of the mouse hpa homolog that encodes for the mouse 
heparanase. 

5 Searching for consensus protein domains revealed an amino terminal 

homology between the heparanase and several precursor proteins such as 
Procollagen Alpha 1 precursor, Tyrosine-protein kinase-RYK, Fibulin-1, 
Insulin-like growth factor binding protein and several others. The amino 
terminus is highly hydrophobic and contains a potential trans-membrane 
0 domain. The homology to known signal peptide sequences suggests that it 
could function as a signal peptide for protein localization. 

The amino acid sequence of human heparanase was used to search 
for homologous sequences in the DNA and protein databases. Several 
human EST's were identified, as well as mouse sequences highly 
5 homologous to human heparanase. The following mouse EST's were 
identified AA177901, AA674378, AA67997, AA047943, AA690179, 
All 22034, all sharing an identical sequence and correspond to amino acids 
336-543 of the human heparanase sequence. The entire mouse heparanase 
cDNA was cloned, based on the nucleotide sequence of the mouse EST's. 
PCR primers were designed and a Marathon RACE was performed using a 
Marathon cDNA library from 15 days mouse embryo (Clontech) and from 
BL6 mouse melanoma cell line. The mouse hpa homologous cDNA was 
isolated following several amplification steps. A 1.1 kb fragment was 
amplified fi-om mouse embryo Marathon cDNA library. The first cycle of 
amplification was performed with primers mhpl773 and Apl and the second 
cycle with primers mhpl736 and AP2. A 1.1 kb fragment was then 
amplified from BL6 Marathon cDNA library. The first cycle of 
amplification was performed with the primers mhpll52 and Apl, and the 
second with mhpl83 and AP2. The combined sequence was homologous to 
nucleotides 157 - 1702 of the human hpa cDNA, which encode amino acids 
33-543. The 5' end of the mouse hpa gene was isolated from a mouse 
genomic DNA library using the Genome Walker kit (Clontech). An 0.9 kb 
fragment was amplified from a Dral digested Genome walker DNA library. 
The first cycle of amplification was performed with primers mhplll4 and 
Apl and the second with primers mhpll03 and AP2. The assembled 
sequence (SEQ ID NOs:43, 45) is 2396 nucleotides long. It contains an 
open reading frame of 1605 nucleotides, which encode a polypeptide of 535 
amino acids (SEQ ID NOs:44, 45), 196 nucleotides of 3' untranslated 
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region (UTR), and anupstream sequence which includes the promoter 
region and the 5'-UTR of the mouse hpa cDNA.. According to two 
promoter predicting programs TSSW and TSSG, the transcription start site 
is localized to nucleotide 431 of SEQ ID NOs:43, 45, 163 nucleotides 
5 upstream of the first ATG codon. The 431 upstream genomic sequence 
contains the promoter region. A TATA box is predicted at position 394 of 
SEQ ID NOs:43, 45. The mouse and the human hpa genes share an 
average homology of 78 % between the nucleotide sequences and 81 % 
similarity between the deduced amino acid sequences. 
10 Search for hpa homologous sequences, using the Blast 2.0 server 

revealed two EST's from rat: AI060284 (385 nucleotides, SEQ ID NO:46) 
which is homologous to the amino terminus (68 % similarity to amino acids 
12-136) of human heparanase and AI237828 (541 nucleotides, SEQ ID 
NO:47) which is homologous to the carboxyl terminus (81 % similarity to 
5 amino acids 500-543) of human heparanase, and contains a 3'-UTR. A 
comparison between the human heparanase and the mouse and rat 
homologous sequences is demonstrated in Figure 17. 

EXAMPLE 13 
Prediction of heparanase active site 
Homology search of heparanase amino acid sequence against the 
DNA and the protein databases revealed no significant homologies. TTie 
protein secondary structure as predicted by the PHD program consists of 
alternating alpha helices and beta sheets. The fold recognition server of 
UCLA predicted alpha^eta barrel structure, with under-threshold 
confidence. 

Five of 1 5 proteins, which were predicted to have most similar folds, 
were glycosyl hydrolases fi-om various organisms: Ixyza - xylanase from 
Clostridium Thermocellum, Ipbga - 6-phospho-beta-5-galactosidase from 
Lactococcus Lactis, lamy - alpha-amylase from Barley, lecea - 
endocellulase from Acidothermus Cellulolyticus and Iqbc 
hexosaminidase alpha chain, glycosyl hydrolase. 

Protein homology search using the bioaccelerator pulled out several 
proteins, including glycosyl hydrolyses such as beta-fi:iictofuranosidase 
from Viciafaba (broad bean) and from potato, lactase phlorizin hydrolase 
from human, xylanases from Clostridium thermocellum and from 
Streptomyces halstedii and cellulase from Clostridium thermocellum. 
Blocks 9.3 database pulled out the active site of glycosyl hydrolases family 
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five, which includes cellulases from various bacteria and fungi. Similar 
active site motif is shared by several lysosomal acid hydrolases (63) and 
other glycosyl hydrolases. The common mechanism shared by these 
enzymes involves two glutamic acid residues, a proton donor and a 
5 nucleophile. 

Despite the lack of an overall homology between the heparanase and 
other glycosyl hydolases, the amino acid couple Asp-Glu (NE), which is 
characteristic of the proton donor of glycosyl hydrolyses of the GH-A clan, 
was found at positions 224-225 of the human heparanase protein sequence. 
10 As in other clan members, this NE couple is located at the end of a p sheet. 

Considering the relative location of the proton donor and the 
predicted secondary structure, the glutamic acid that functions as 
nucleophile is most likely located at position 343, or at positon 396. 
Identification of the active site and the amino acids directly involved in 
15 hydrolysis opens the way for expression of the defined catalytic domain. In 
addition, it will provide the tools for rational design of enzyme activity 
either by modification of the microenviroment or catalytic site itself 

EXAMPLE 14 
Expression ofhpa antisense in mammalian cell lines 
A mammalian expression vector Hpa2Kepcdna3 was constructed in 
order to express hpa antisense in mammalian cells, hpa cDNA (1.7 kb 
EcoRl fragment) was cloned into the plasmid pCDNA3 in 3'>5' (antisense) 
orientation. The construct was used to transfect MBT2-T50 and T24P cell 
lines. 2 X 10^ cells in 35 mm plates were transfected using the Fugene 
protocol (Boehringer Mannheim). 48 hours after transfection cells were 
trypsinized and seeded in six well plates. 24 hours later G418 was added to 
initiate selection. The number of colonies per 35 mm plate following 3 
weeks: 

Antisense No insert 
T24P 15 60 

MBT-T50 1 6 

The lower number of colonies obtained after transfection with hpa 
antisense, as compared with the control plasmid suggests that the 
introduction of hpa antisense interfere with cell growth. This experiment 
demonstrates the use of complementary antisense hpa DNA sequence to 
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control heparanase expression in cells. This approach may be used to 
inhibit expression of heparanase in vivo, in, for example, cancer cells and in 
other pathological processes in which heparanase is involved. 

5 EXAMPLE 15 

Zoo blot 

Hpa cDNA was used as a probe to detect homologous sequences in 
human DNA and in DNA of various animals. The autoradiogram of the 
Southern analysis is presented in Figure 18. Several bands were detected in 
0 human DNA, which correlated with the accepted pattern according to the 
genomic hpa sequence. Several intense bands were detected in all 
mammals, while faint bands were detected in chicken. This correlates with 
the phylogenetic relation between human and the tested animals. The 
intense bands indicate that hpa is conserved among mammals as well as in 
5 more genetically distant organisms. The multiple bands patterns suggest 
that in all animals, like in human, the hpa locus occupy large genomic 
region. Alternatively, the various bands could represent homologous 
sequences and suggest the existence of a gene family, which can be isolated 
based on their homology to the human hpa reported herein. This 
conservation was actually found, between the isolated human hpa cDNA 
and the mouse homologue. 

EXAMPLE 16 
Characterization of the hpa promoter 

The DNA sequence upstream of the hpa first ATG was subjected to 
computational analysis in order to localize the predicted transcription start 
site and to identify potential transcription factors binding sites. Recognition 
of human PolII promoter region and start of transcription were predicted 
using the TSSW and TSSG programs. Both programs identified a promoter 
region upstream of the coding region. TSSW pointed at nucleotide 2644 
and TSSG at 2635 of SEQ ID NO:42. These two predicted transcription 
start sites are located 4 and 13 nucleotides upstream of the longest hpa 
cDNA isolated by RACE. 

A hpa promoter-GFP reporter vector was constructed in order to 
investigate the regulation of hpa transcription. Two constructs were made, 
containing 1,8 kb and 1.1 kb of the hpa promoter region. The reporter 
vector was transfected into T50-mouse bladder carcinoma cells. Cells 
transfected with both constructs exhibited green fluorescence, which 
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indicated the promoter activity of the genomic sequence upstream of the 
hpa-coding region. This reporter vector, enables the monitoring of hpa 
promoter activity, at various conditions and in different cell types and to 
characterize the factors involved regulation of hpa expression. 

5 

Although the invention has been described in conjunction with 
specific embodiments thereof, it is evident that many alternatives, 
modifications and variations will be apparent to those skilled in the art. 
Accordingly, it is intended to embrace all such alternatives, modifications 
10 and variations that fall within the spirit and broad scope of the appended 
claims. 
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WHAT IS CLAIMED IS: 

1. An isolated nucleic acid comprising a genomic, 
complementary or composite polynucleotide sequence encoding a 
polypeptide having heparanase catalytic activity. 

2. The isolated nucleic acid of claim 1, wherein said 
polynucleotide or a portion thereof is hybridizable with SEQ ID NOs: 9, 13, 
42, 43 or a portion thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x Denharts, 10 
% dextran sulfate, 100 |ig/ml salmon sperm DNA, and 32p labeled probe 
and wash at 68 with 3 x SSC and 0.1 % SDS. 

3. The isolated nucleic acid of claim 1, wherein said 
polynucleotide or a portion thereof is at least 60 % identical with SEQ ID 
NOs: 9, 13, 42, 43 or portions thereof as determined using the Bestfit 
procedure of the DNA sequence analysis software package developed by 
the Genetic Computer Group (GCG) at the university of Wisconsin (gap 
creation penalty - 12, gap extension penalty - 4). 

4. The isolated nucleic acid of claim 1, wherein said polypeptide 
is as set forth in SEQ ID NOs: 10, 14, 44 or portions thereof. 

5. The isolated nucleic acid of claim 1, wherein said polypeptide 
is at least 60 % homologous to SEQ ID NOs: 10, 14, 44 or portions thereof 
as determined with the Smith-Waterman algorithm, using the Bioaccelerator 
platform developed by Compugene (gapop: 10.0, gapext: 0.5, matrix: 
blosum62). 

6. A nucleic acid construct comprising the isolated nucleic acid 
of claim 1 , 

7. A host cell comprising the nucleic acid construct of claim 6. 

8. An antisense oligonucleotide comprising a polynucleotide or a 
polynucleotide analog of at least 10 bases being hybridizable in vivOy under 
physiological conditions, with a portion of a polynucleotide strand encoding 
a polypeptide having heparanase catalytic activity. 
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9. The antisense oligonucleotide of claim 8, wherein said 
polynucleotide strand encoding said polypeptide having heparanase 
catalytic activity is as set forth in SEQ ID NOs: 9, 13, 42, or 43. 

10. The antisense oligonucleotide of claim 8, wherein said 
polypeptide having heparanase catalytic activity is as set forth in SEQ ID 
NOs: 10, 14 and 44. 

11. A method of in vivo downregulating heparanase activity 
comprising the step of in vivo administering the antisense oligonucleotide of 
claim 8. 

12. A pharmaceutical composition comprising the antisense 
oligonucleotide of claim 8 and a pharmaceutically acceptable carrier. 

13. A ribozyme comprising the antisense oligonucleotide of claim 
8 and a ribozyme sequence. 

14. An antisense nucleic acid construct comprising a promoter 
sequence and a polynucleotide sequence directing the synthesis of an 
antisense RNA sequence of at least 10 bases being hybridizable in vivo^ 
under physiological conditions, with a portion of a polynucleotide strand 
encoding a polypeptide having heparanase catalytic activity. 

15. The antisense nucleic acid construct of claim 14, wherein said 
polynucleotide strand encoding said polypeptide having heparanase 
catalytic activity is as set forth in SEQ ID NOs: 9, 13, 42 or 43. 

16. The antisense nucleic acid construct of claim 14, wherein said 
polypeptide having heparanase catalytic activity is as set forth in SEQ ID 
NOs: 10, 14 or 44. 

17. A method of in vivo downregulating heparanase activity 
comprising the step of in vivo administering the antisense nucleic acid 
construct of claim 14. 
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18. A pharmaceutical composition comprising the antisense 
nucleic acid construct of claim 14 and a pharmaceutically acceptable 
carrier. 

19. A nucleic acid construct comprising a polynucleotide 
sequence functioning as a promoter, said polynucleotide sequence is derived 
from SEQ ID NO:42 and includes at least nucleotides 2535-2635 thereof or 
from SEQ ID NO:43 and includes at least nucleotides 320-420. 

20. A method of expressing a polynucleotide sequence 
comprising the step of ligating the polynucleotide sequence into the nucleic 
acid construct of claim 19, downstream of said polynucleotide sequence 
derived from SEQ ID NOs:42 or 43. 

21. A recombinant protein comprising a polypeptide having 
heparanase catalytic activity. 

22. The recombinant protein of claim 21, wherein said 
polypeptide includes at least a portion of SEQ ID NOs: 1 0, 1 4 or 44. 

23. The recombinant protein of claim 21, wherein the protein is 
encoded by a polynucleotide hybridizable with SEQ ID NOs: 9, 13, 42, 43 
or a portion thereof at 68 °C in 6 x SSC, 1 % SDS, 5 x Denharts, 10 % 
dextran sulfate, 100 jig/ml salmon sperm DNA, and 32p labeled probe and 
wash at 68 °C with 3 x SSC and 0.1 % SDS. 

24. The recombinant protein of claim 21, wherein the protein is 
encoded by a polynucleotide at least 60 % identical with SEQ ID NOs: 9, 
13, 42, 43 or portions thereof as determined using the Bestfit procedure of 
the DNA sequence analysis software package developed by the Genetic 
Computer Group (GCG) at the university of Wisconsin (gap creation 
penalty - 12, gap extension penalty - 4). 

25. A pharmaceutical composition comprising, as an active 
ingredient, the recombinant protein of claim 2 1 . 

26. A method of identifying a chromosome region harboring a 
heparanase gene in a chromosome spread comprising the steps of: 
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(a) hybridizing the chromosome spread with a tagged 
polynucleotide probe encoding heparanase; 

(b) washing the chromosome spread, thereby removing excess of 
non-hybridized probe; and 

(c) searching for signals associated with said hybridized tagged 
polynucleotide probe, wherein detected signals being 
indicative of a chromosome region harboring a heparanase 
gene. 

27. A method of in vivo eliciting anti-heparanase antibodies 
comprising the steps of administering a nucleic acid construct including a 
polynucleotide segment corresponding to at least a portion of SEQ ID 
NOs:9, 13 or 43 and a promoter for directing the expression of said 
polynucleotide segment in vivo, 

28. A DNA vaccine for in vivo eliciting anti-heparanase 
antibodies comprising a nucleic acid construct including a polynucleotide 
segment corresponding to at least a portion of SEQ ID NOs:9, 13 or 43 and 
a promoter for directing the expression of said polynucleotide segment in 

vivo. 
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Fig X 

1 CTAGAG CTTTCGACTCTCCG CTGCGCGGCAGCTGGCGGGGGGAGa^GCCAGGTGAGCCC^ 




GCGCAAGCACAGGACCTOnT^ 
PGAL PRPAQAQOVVD 

lei ACCT(X5A L " i ' lV " lU^: ACCCAOGAGCOG K L'l\JCACCTO^ 

LDFFTQEPLHLVS PSPLSVT 

24 1 CCATTGACGCCAACCTCGCCaCGGACCPSCGtfri^^^ 

IDANLATDPRFLII.'l*"G6PKL 

RT LARGLSPAYLRFGGTKTD. 

361 ACTTOCTAATTTTOGATCCCAAGAAOGAATCAACCTTTGAAGA^ 

PL I FD PKKB ST F E B R GTtTQS 

421 CTCAAGTCAACCAOGATATTTOCAAATAT(K»TCCATC^ 

QVNQDICKYG. S I P PD VBBKL 

481 TACncrrroGAATWCXCTACXaGGAGCAATTGCTACrCCGAG^ 

RLBHPYQEQLLLRRHyQKKF 

541 TCAACSAACAGCACCTACTCAAGAAGCTCTGTAGATGTGCTATACACTTT^^ 

KNSTYSRSSVDVLYTFAN CS 

601 CAGGACIXXy^CTITGATCrTTGGCCTAAATGCXrrTATTAAGAAC^ 

GLDLI FGLNALLRTADLQHN 

661 ACAGTl'CTAATCCTCAGTTGCTCCTGGACTACTtXrrCTT^ 

SSNAQLLLDYCSSKGYWISW 

721 GGGAACTA(MCAATGAA(XTAAOVGTTTCCTTAAGAAGGCTX»TATTTTC^ 

ELGNEPKSFLKKADXFINGS 
'T> 

- £ I C:;CAGTTAGGAGAAGATTATATTCAATTGC:=iTAAACTTCTAAGAA.^GTCrACCTTCAAA^^ 
C L- G E D y i Q L H K L ?. K S T F K i; 

(FJ 

8« i ATGCAAAACTCTATGGTCCTGATCTTGGTCAGCCTCGAAGAAAGACGGCTAAGATGCTG 
AKLYGPDVGQPRRKTAKMLK 

S u 1 AGAGCTTCCTGAAGGCTGGTGGAGAAGTGATTGATTCAGTTACATGGCATCACTACT 

EFLKAGGEV I DSVTWHHYYL 

961 TGAATGGACGGACroCTACCy^GGGAWSATTTTCTAAACCCTGATGTATTGC^ 

KGRTATREDFLrNPDVLDIFI 

1021 TTTCATCTGTGCAAAAAGTTTTCCAGGTGGTTGAGAGCACCAGGCCTGGCSUl(^^ 

SSVQKVFQVVESTRPGKKVH 

1081 GGTTAGGAGAAACAACCTCTGCATATGGJU3GCOCSAGCGCCCTTGCTATC^^ 

LGETSSAYGGGAPLI.SDTFA 

1141 CAGCTGGCrrrrATGTGGCTGGATAAATTGGGCCTGTCAGCCCGAATGGGA^ 

AG FMWLDKLGLSARMGIEVV 

1201 TGATGAGCCAACTATTCTTTGGAGCAGGAAACTACCATTTAGTGGATGAAAACT^ 

MRQVFFGAGNYHLVDENPDP 

1261 CTTTACCTCATTATTGCCTATCTCrTCi XJU't'CAAGAAATTGg 

LPDY HLSLLFKKLVGTKVLM 

1321 TGGCAAGCGTGCAAGGTTCAAAGAGAAGGAAGCTTCGAGTATACCTTCATT^ 

ASVQGSKRRKLRVY1.HCTNT 

1381 CTGACAATCCAAGCTATAAAGAAGGAC^TTTAACTCTGTATGCCATAAACC^^ 

DM PRYKEGDLTLYAINLHHV 

1441 TCACCAAGTACTTGC3GGTTACCCTATCCTTTTTCTAACAAGCA^^ 

TKYLRLPYPFSNKQVDKYLL 

1501 TAAGACCTTTGGGACXrrtyVTGGATTACTTTCCAAATCT^^ 

RPLGPHGLLSKSVQLHGLTL 

1561 TAAAGATGGTGGATGATCAAACCTTGO^CCTTTAATGGAAAAACCT^^ 

KHVDDQTLPPLMEKPLRPGS 

1621 GTTCACTGGGCTTCCCAGC'J'i'iViXJVTATA UT ' l ' l ' iUUUXJ ' iXi ATAAGAAATC^^ 

SLGL.PAFSYSFFVIRNAKVA 

1681 CTCCTTGCATCTGAAAAT AAAATATACTAGTCCTGACACTG 
A C 1 
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FIG. 3A 




Fraction 



FIG. 3B 




10 20 30 40 50 



Fraction 



wo 00/52178 



PCTAJSOO/03542 



4/34 




wo 00/52178 



PCT/USOO/03542 



5/34 

FIG. 5A 
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FIG. 7A 
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FIG. 8A 




r raciion 



FIG. 8B 

peak II 




Fraction 



wo 00/52178 



PCT/USOO/03542 



9/34 



FIG. 9A 
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FIG. lOB 
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FIG. 12A 
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Fig. 13 



mouse CTGGCAAGAAGGTCTGGTTGGGAGAGACGAGCTCAGCTTACGGTGGCGGT 5a 
I I i I I I I I M i I l-l N.I I 1 I I I I I i-l Mill II II II I I t II 

human cf GGCAAGAAGGTCTGGTTAGGAGAAACAAGCTCTGCATATGGAGGCGGA 1115 
• * " " . 

mouse GCACCCTTGCTGTCCAACACCTTTGCAGCTGGCTTTATGTGGCTGGATAA 100 
If I M I I II I 1 II I i I I 1 I I I I I I It i M t I M I I II I I 1 I M I I II 

human GCGCCCTTGCTATCCGACACCTTTGCAGCTGGGTTTATGTGGCTGGATAA 1165 

mouse ATTGGGCCTGTCAGCCCAGATGGGCATAGAAGTCGTGATGAGGCAGGTGT 150 

M I II M I M I I I I I II II I M i II I II I I t I M I I I t I I I If I 
human ATTGGGCCTGTCAGCCCGAATGGGAATAGAAGTGGTGATGAGGCAAGTAT 1215 

mous e TCTTCGGAGCAGGCAACTACCACTTAGTGGATGAAAACTTTGAGCCTTTA 2 00 

MM II I I I I II I I I 11 f M M I I I 1 II I M I II II j II II I M I 
human TCTTTGGAGCAGGAAACTACCATTTAGTGGATGAAAACTTCGATCCTTTA 1265 

mouse CCTG.^TTACTGGCTCTCTCTTCTGTTCAAGAAACTGGTAGGTCCCAGGGT 2 50 

II I I i I I I I I I M M I I If M M II I 1 I i I I fill II Ml Ml 
human CCTGATTATTGGCTATCTCTTCTGTTCAAGAAATTGGTGGGCACCAAGGT 1315 

i?>ou s e GTTACTGTC;iAGAGTaii^&AGGCCCAGACAGGAGC.^^-^:^.CT CC GAGTGTAT C 3 0 G 

MM II MM Ml MM M I II II 11 M Mill M I 
human GTTA^TGGCAAGCGTGCAAGGTTCAAAGAGAAGGAAGCTTCGAGTATACC• 13 65 

nouse TCCACTGCACTAACGTCTATC^vCCCACGATATCAGGAAGGAGATCTAACT 350 

I I I M I II t II f I I M I 1 M 1 M M I M II II 11 I 
human TTCATTGCACA.AA.CACTGACAATCCAAGGTATAAAGAAGGAGATTTAACT 1415 

mouse CTGTATGTCCTGAACCTCCATAATGTCACCAAGCACTTGAAGGTACCGCC 4 00 

M I M II i I It M M I M II M 1 M i I II' II II I ; MM 
human CTGTATGCCATA.AaLCCTCCATAACGTCACCAAGTACTTGCGGTTACCCTA 14 65 

mouse TCCGTTGTTCAGGAAACCAGTGGATACGTACCTTCTGAAGCCTTCGGGGC ^ 50 
f I I I I I I M f M II M I I I II M i M I II M M t I 

human TCCTTTTTCTAACAAGCAAGTGGATAAATACCTTCTAAGACCTTtGGGAC 1515 

mouse CGGATGGATTACTTTCCAAATCTGTCCAACTGAACGGTCAAAT'TCTGAAG 500 

f I M I t II t II II M I M I II I M i i M I M t I I I II III Ml 
human CTCATGGATTACTTTCCAAATCTGTCCAACTCAATGGTCTAACTCTAAAG 1565 

mouse ATGGTGGATGAGCAGACCCTGCCAGCTTTGACAGAAAAACGTCTCCCCGC 550 

IIIIIIMIM M ill MM! Mill MIIIIMIIIM | 
human ATGGTGGATGATCAAACCTTGCCACCTTTAATGGAAAAACCTCTCCGGCC 1615 

mou s e AGGAAGTGCACTAAGCCTGCCTGCCTTTT CCTATGGTTTTTTT GTCATAA 600 

NIMH II II M MM II M II IN II I II I M M MM 
human AGGAAGTTCACTGGGCTTGCCAGCTTTCTCATATAGTTTTTTTGTGATAA 1665 

mouse GAAATGCCAAAATCGCTGCTTGTATATGAAAATAAAA 637 

I M M M I II I I II M.M II M TTTi II I M I I 
human GAAATGCCAAAGTTGCTGCTTGCATCTGAAAATAAAA 17 02 
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Figure 16 

ggatcttggctcactgcaatctctgcctcccatgcaattcttatgcatca 50 

gcctcctgagtagcttggattataggtctgcgccaccactcctggctaca 100 

ccatgttgcccaggctggtcttgaactcttgggctctagtgatccacccg 150 

ccttggcctcccaaagtgctgggattacaggtgtgagccatcacacccgg 200 

ccccccgtttccatattagtaactcacatgtagaccacaaggatgcacta 250 

tttagaaaacttgcaatggtccacttttcaaatcacccaaacatgttaaa 300 

gaaattggtatgactgggcatggcacagtggctcatgcctgcaatcctag 350 

cattttgtgaggctgagacgggcagatcacgaggtcaggagattgagacc 400 

atcctgacagacatggtgaaatcccatctctactaaaaatacaaaacaat 450 

tagccgggggtgatggcaggcccctgtagtcccagctactcgggaggctg 500 

aggcaggagaatggcgtgaatccaggaggcagagcttgcagtgagccgag 550 

atggtgccactgcactccagcctgggcgacagagcgagactccgtctcaa 600 

aaaaaaaaaaaaagaaagaaattggtatgactgttgactcacaacaggag 650 

tcaggggcatggggtggggtgtaagattaatgtcatgacaaatgtggaaa 700 

agaaacttctgtttttccaactccacgtctgctaccatattattacactc 750 

ttctggtagtgtggtgtttatgtgtgaattttttttcatatgtatacagt 600 

aattgtaggatiatigaacctgattctagttgcaaaactcactatgagctta 650 

gcttttaagttgcttaagaataggtagatctatgcaaataatgataatta 900 

ttattattattttaagagagggtctcact.ttgtcacccaggctggagtgc 950 

agtggtgtgattaagggtcactgcaacctccacctcccaggctcaaataa 1000 

acctcccacctcagcctccccagtagctggaaccacaggcacgggccacc 1050 

acgcctggctaattttttgtattttttgtagagatggggtttcatcatgt 1 100 

tgcccaggctgttcttgaattcctcggctcaagcaatcctcccaccttgg 1150 

cctcccaaaatgctggcatcacaggcatgatggcatcactggcatcacat 1200 

accatgcctggcctgatttatgcaaattagatatgcatttcaaaataatc 1250 

tatttttatttgttgccttattggtggtacaatctcaagtggaaaaatct 1300 

aagggttttggtgttatttgcttactcaaccaatatttattagactctta 1350 

ctaagcaccaacatgatcacatgcctgagctatggctagcatagcgtgtg 1400 

agacaaacttaatctctgttttggtggagcatataatctagtagatgaag 14 50 

ccaatgttgagcaacatcacaatactaacaaattgaggatgctacgagag 1500 

tgtctaacaaattgaggatgctacgagagtgtctaacaaattgaggatgc 1550 

tatgagagtgtgtcatggagagctgcctggagattgagagaaagcttcct 1600 

tgagggaagttacatttcagctgaaacacactgccatctgctcgaggttt 1650 

tgtaactgcattcacatcccgattctgacacttcacatcccgattctgac 17 00 

acttcacccagttactgtctcagagcttgggtccgcatgtgtaaaacaag 1750 

gacagtatgcacttggcagggttgtgagaagggaagagaacacaagtaaa 1600 

gcacctgtatcaggcatacagtaggcactaagcgtgcgatgcttgctatg 18 50 

att:at.acat:cagtgt;aagcatcaaggaaaagctgaagaaaagtctgacca 1900 

acagcgaaagataaatgcgcagaggagaaatttggcaaaggctccaaatt 1950 

caggggcagtccgtactctacactttgtatgggggcttcagqtcctgagt 2000 

tccagacattggagcaactaaccctttaagattgctaaatattgtcttaa 2050 

tgagaagttgataaagaattttgggtggttgatctctttccagctgcagt 2100 

ttagcgtatgctgaggccagattttttcaagcaaaagtaaaatacctgag 2150 

aaactgcctggccagaggacaatcagattttggctggctcaagtgacaag 2200 

caagtgtttataagctagatgggagaggaagggatgaatactccattgga 2250 

ggttttactcgagggtcagagggatacccggcgccatcagaatgggatct 2300 

gggagtcggaaacgctgggttcccacgagagcgcgcagaacacgtgcgtc 2350 

aggaagcctggtccgggatgcccagcgctgctccccgggcgctcctcccc 24 00 

gggcgctcctccccaggcctcccgggcgcttggatcccggccatctccgc 2450 

acccttcaagtgggtgtgggtgatttcgtaagtgaacgtgaccgccaccg 2500 

aggggaaagcgagcaaggaagtaggagagagccgggcaggcggggcgggg 2550 

t.tggat:tgggagcagtgggagggatgcagaagaggagt:gggagggat:gga 2600 

gggcgcagtgggaggggtgaggaggcgtaacgggGCGGAGGAAAGGAGAA 2650 

AAGGGCGCTGGGGCTCGGCGGGAGGAAGTGCTAGAGCTCTCGACTCTCCG 2700 

CTGCGCGGCAGCTGGCGGGGGGAGCAGCCAGGTGAGCCCAAGATGCTGCT 2750 

M L L 

GCGCTCGAAGCCTGCGCTGCCGCCGCCGCTGATGCTGCTGCTCCTGGGGC 2800 

RSKPALPPPLMLLLLG 

CGCTGGGTCCCCTCTCCCCTGGCGCCCTGCCCCGACCTGCGCAAGCACAG 28 50 
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PLGPLSPGALPRPAQAQ 
GACGTCGTGGACCTGGACTTCTTCACCCAGGAGCCGCTGCACCTGGTGAG 2900 

DVVDLDFFTQEPLHLVS 

CCCCTCGTTCCTGTCCGTCACCATTGACGCCAACCTGGCCACGGACCCGC 2950 

PSFliSVTIDANLATDP 

GGTTCCTCATCCTCCTGGGgtaagcgccagcctcctggtcctgtcccctt 3000 
R F L I L L G 

tcctgtcctcctgacacctatgtctgccccgccagcggctctccttcttt 3050 

tgcgcggaaacaacttcacaccggaacctccccgcctgtctctccccacc 3100 

ccacttcccgcctctcattctccctctccctcccttactctcagacccca 31-50 

aaccgctttttggggggtatcatttaaaaaatagatttaggggttacaag 3200 

tgcagttctgttccatgggtatattgcattgtggtggcatctgggctctt 3250 

agtgtaactgtcacccgaatgttgtacattgtatctaataggtaatttct 3300 

catccctcatccctctcccaccctcccaccttttggagtctccagtgtct 3350 
actattccactaagtccatgtgtacacattgtttagcgcccactctaaat . 34 00 

gagcctttttgtttcattcattctgtaagtgttgaataggcaccacctaa 3450 

ggtcaggtataagtggaaatttgaaaaagaaactgcccacttgccccagt 3500 

acttccctagccaagaggagggaaaccaggcaggtgcacctgaaggcctg 3550 

tgagtgcttgatttgctgtgcagtgtaggacaagtaagattgtgcatagc 3600 

cttctgtatttaagactgtgttaggaagatttctctttcttttcttttct 3650 

ttttcttttttcttttcttttttttttttaggcagatgaaaagggcgtca 3700 

cagaacaggaataaaaatctaaatattcaataaatgagacctaggagact 3750 

actgcagtgacttacaaagtcctaataaaaagatgtctctccaaaatggg 3800 

gctgcaaaatgtggtgctgccttatcagctctaagttttttccttacctg 3850 

agaaagaaggaacctgatgcaggttcagggctcctgccccatgaatgcag 3900 

gctgactccaagatggggagctacagggacaatcccaggtcttctaggcc 3950 

tcttatttaggccctgggagcctccagagatggccacatcttgaccagcc 4 000 

cagatagagggaaagatcaccattatctcacctctgtgtcaaatacctag 4 050 

atgctgtcctccctgagcccacactatagttgccagcgctaatttaatgg 4100 

gtagtgtactggttaagaga tggacagaccatcctggcttgactctcagc 4150 

tctggcaaagatgagtgacttggtttttccatatctcttggccacaccaa 4200 

ccttgatttcttcagctgtagaatggaatttctcaagcttgcctcaagga 4 250 

ttattgcccgaggatttgatgatatggtaagagcttctcagtgtttgacc 4 300 

ca tagtaagtgtttgacgtttcaaacgaattgtttctttctaggacatgg 4 350 

tgagcatttggtagccattcaccggttttctgtttctttggatcatagtt 44 00 

aacctctccttttccttctggcactacaattttctggtggggaagaatcc 4 450 

ttacttt-tgcc:rttcccctt:aaggataggaagctg2tactaggcagca= 4 50C 

ctagttgggggataggaagattgttccagagaaatgctgaaccatagggc 4 550 

tccagatcacaggaccccagtcttagcttgctggggtgtggggtgggggg 4 600 

gggcggttactgaacatgggtatgaagtagatgtccatttactgaaatgt 4 650 

gaggacctgaggcctcttctattgctgtagccagcatattccccaacctc 4 700 

tccccaagaaaggacagatgggggttcccccctggagtaacaggtccaaa 4 750 

agaaaaaacatacagtgggacttccaggatctgggcctgatcacccagca 4 800 

gtcaagctccccgcaattgactaacacccccctaacacgtagaaattcca 4 850 

atctgcaatttagtgaggatgatacctttattcttcttaaatacatctct 4 900 

tcatttcccagagcacccttttttcccctcctctgcacctttttgttaaa 4 950 

gactggagtataatgaaataccaagagagcataacatgtgatacataaaa 5000 

ctttttttctggtttacaaaacagttcattcttgtccatacgtgcttctc 5050 

tccaaggctggctgctgtctgttccagcccgcttcgcttggagaggccat 5100 

ctgccatacctgctccccagacgcatcgacaagcacacccagagtgttat 5150 

ctgctaagacctaaaagagggaggaaccccctctcctcatctaagaccta 5200 

gcttctaaattagagtgtgagggtccatctccccaggaggggcacagggc 5250 

ccaaacagcccagccatctcagaagacaacactaagctttgtaggggtcc 5300 

acagtagaggagagtaagacgcctgttgtttaatttattacagttcctca 5350 

aaagtgaagatgtgtgggcgggatggcaagagctgagcagacgaaagctg 5400 

aaggaataaggaaagagaggaggacacaaacagctgacacttcctcagtt 5450 

cttgtcatttgcctggccctgttctaagcaccttctaggtattaatccat 5500 

ttagtcttggctacaacactgtgagtaactagttttgtcacccccatttt 5550 

aaaaatgaagaaagtgaggctcagggaggttaagtaacttggccacagtt 5600 

tgaaactagactctgatcacatgagataatagtgcccataaaaagggaaa 5650 

gcagattatattttttaaaggaaagagagtaggatatggtagaaaaagat 57 00 



wo 00/52178 



PCT/USOO/03542 



Fi€. 16 (continued) ^"'"'^ 

tgtttggaaaggaattgagagattgatataatgaaaagaagcattcacat 5750 

gagagtaacagtatcagggcccaaaccttcatctaaggtacttcaaagag 5800 

gcctaagcaaacttagtcactggcgtggttctagtctccatgatggcaaa 5650 

tacattgtgtacagcccaactccacacaaaacttaaataccaatgataga 5900 

gcaatctaaaatttgaaagaaaaaatctttcaatttgtcgtcttcccaga 5950 

gggacttaatcaagaaaccaatcaaaatacttcctaagcctaactgtgtg 6000 

cagaactccaaagagagcccagccctaaatcaacactgtccaatggaaa^ 6050 

ataatataatgtgggcctcatatgcaaggtcatatgtaattttaaatttt 6100 

ctagtagccatattaaaaaggtaaaaagaaacaagtgaaattaattttaa 6150 

taattttatttagt:t:cakt.agatccaaaatgttttctcagcatgtaatca 6200 

atataaaaatattaatgaggtatttattattccttttctcaaaccaagtc 6250 

tattctataatctggcgtgtattatttacagcacttctcagactatattt 6300 

ctttctttcttttttttttccgagacaattttgctcttgtcacccaagct 6350 

agagtacaatggcgttacctcggctcactgcaacctccgcctcccgggtt 6400 

caagttattctcctgcctcagtctcccaagtagctgggactagaggcatg 6450 

caccaccacgcctggctaattgtgtatttttagtagagacagggtttcac 6500 

catgttggccaggctaatctcaaactcctgagctcaggtgatatgcccac 6550 

ctcggcctcccaaagtgttgggattacaggcgtgagccactgcacccggc 6600 

ctcagattaactatatttcaagcgttcagtagccacatgtagctagtgct 6650 

atggtagtggacagtacagatctgcatttcaattaagacacgtatacaag 6700 

catagttcactaatgcacggtaaaaaaaagtatagtgctgagtcggtggt 67 50 

agaaatcctaaatactgcagagcaaaagtggtacgaacagcaatctcagt 6800 

gataatgcaaccatgcttgcttttcattgcaatttgcttattttccttca 6850 

gcaaagttcatccatttttgccaattcaataaatatttactgataaaaac 6900 

tttcaatattagattcttgcatcttcatagacagagttgcttttcacatt 6950 

tagaaaattacttatcaatgttaaacacacgttttgataaccagtgttgg 7000 

aaagaggtgcagactccccatgtgcctattgatggcagaaatattcacag 7 050 

ccaaagggaaacaaagggctggggacaatcacacacctcatgtctcctaa 7100 

ctcctgggaagtgctgtccctctgattgagctcttattattgccttcccc 7150 

actaaccctgtccactgtgccctggagccctttgcagggttacctgctct 7200 

gtcctcctcacagaatatctcctctacctccttgtccaagctacaacttg 72 50 

gctattctctgatgacactgtcttccctgtagcccttttgagtaatggct 7300 

gcatattctcccatagtccagttcttttcctgttctccagtctggcttct 7 350 

ggatgacagcccactagtttgaactccatactgctatagttcaagtccct 74 00 

tttgacttgttaccttgggcaaattacctccttttgttcaggttccttgt 74 50 

ttgtaaaatgacgataataatgccatttgcttcagtgggttattttgaaa 7 500 

ttgagtgaaagaaggcgggtagcttccctacacgctcagtgtagactagc 7550 

ctgatgtgcattacgggtgatgccatgactcagtgtgttttcctcatctc 7 600 

cacatctggctctcatccagtgctcctgcttacggcactctgtccccctc 7 650 

ttacttactcccccttattaactgaagactggcactgatctcacagtttc 7 7 00 

ctctccacttcctagtctcaccatcatcctagatgacttcaagtcaccta 7 750 

gataaactgtctcagtttcttcactcacatttttttataacagataatgt 7 800 

tacactcaagttgtaacagaaccagcttatccagctcatgaaatgtatgc 7 650 

atttcatctcaactctgtattcagtgacatcctgtgggtatctggaaatc 7 900 

agccatggtgagaatatttaccatggaaattggcaaatactaaaaagcag 7 950 

agcacctttttttctgagagccagaccatagctcttctactccatagcac 8 000 

ccatcataacaatttttaaatacctccactgaacagcttcttcctctctc 8050 

tacttcttccatatctgatttgagcttcttaatttatcatgtgaaccact 8100 

cttgtaataataaccccaaatccctgttccattgttcttcctgctaaaat 8150 

actaaacctggtttagtccaaccatattttctctctttggaatctacagg 82O0 

gtggcccaaaaacctggaaatggaaaaatattacttattaattttaatgt 8250 

atattaataagccattttaatgcttcatttccagtctcagtggccaccct 8300 

gtatagctgggctattgagctcttgcgggaggagggagtggacagtctcc 8350 

cagccacacagactgatgttgcaccaaacattttttagcttccagacttc 84 OO 

cctggcccttagtgttacccttaactctccatttctctgcctttcacatt 8450 

ctctactttttaaaaatctctgactccaccttcaccttatcattcttagc 8500 

acatgaccatacttctgcttcccaaagaaaatgagcaattacttcctttt 8550 

ccttttcctcctgtcatcaaatctgcagacatgtcatgcctaagtccagc 8600 

tttcctcctttctctgatctcagtctgcttcttccatttctgccctgaat 6650 

cccgtcccctccccaacccccaaggacttcgctctatcagtcacctcttc 87 00 

cctctcctgtatcttcaactcctcccattttactggcttcttcctcaagc 87 50 
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ctttccccaagcctttcccatctcaattacctcctcgcacatgcctctgc 8600 
agaaaccaccccgtttcttccctcccctcggcagcctgttcttcctgttc 8850 
tgccctcatgatggcaccatcattgtgtcactaaaatcaatctctccgac 8900 
atcatcaatggccttcctttgttgggaaacctaataaacactttatctta 8950 
tttggtctttgttatgggttgaatgaggttaccccgaaatccatattaga 9000 
agtcctaacccccagtacctcagaatgtgactttatttgggaatagggtc 9050 
attgcagacgttattagttaggatgaggtcatactggaatgtgatgggct 9100 
gcttatctaatatgactgatgtccttataacaaggagaaatttggagaca 9150 
gacacgcacatagggagaataccatgtgatgacagigagttatggagttgg 9200 
agtcaaaaagctatgggaacttaggagaaagacctggaacaaatcctttc 9^50 
ctgcgcctagagagggagtatggccctgccactaccttgaattcaacgtt 9300 
tcggcttttcaaaactgtaagacaatacatttctgttgttcaaaccaatt 9350 
agtttgcagtactctgcgactgcagccctaacaaactaatacagtctctt 9400 
ggaggcatttggcaaggttgacaatggaagcactttcttacccctttagg 9450 
tctgtcgcctttcttgttggggggtgttttctaacaattcctctccatct 9500 
ctctctctctagtttgtcttaaacattggtgttcttcagacttctgacct 9550 
aggccttcttttcacttcacatattcccctgggtggtctcacccacttcc 9600 
agaaattacttaaattactgctcatgcagtactgtgctggaaactgttta 9650 
acaactggctctctgggaagaggggagactggttgatggtttttgctgat 9700 
ttctgtggtgtaaatactccctccatggccaattccaaactgccaacagt 9750 
ttaacaactggctcacaaattttctccaaatttaacatttggctttcaca 9 800 
ggccaacaacgtggtacagccaactccagcacacctctgcttttgtgtca 98 50 
gagagaagtaacttatttttgtacaaaaggtaaaataaaaacacctgcag 9900 
gccccctttttttccttaacaaactgctctagaaatagaatagctgaagc 9950 
ttcttttatgcattcatctgttatttccatgtcactgtggtggtgggatt 10000 
atttttcctttatttttcttgtatatggttgaaatactgtacctttgatc 10050 
agttttagttttatggcatgttttgcacccatattaaatctagtttttgt 10100 
cagagggcgtcaatattattttctcaaaacaagaaaatatttcattgcaa 10150 
aggagacaaacaaaaaggtccttaataccaaaactttgaaatgtgatttc 10200 
ttgtacttggcagtgtccaagtggtaaacccaaacagtattgggttttca 10250 
ttttgttcaggaaagtctttgtctggcagcgacttacccttacatcaggc 10300 
gggccttgctcattcattcacttaagtatttattaaacaccagcggtgtg 10350 
ccaagtacttatctaggtatcgggtagattctgataagtcagtcaggtcc 10400 
ctgctctcagggagcttgcagcagagatgggggctgcaatagagagtaag 10450 
ccaaggaaatgaaaaaggaagttgatttcagagagtgatgaatgctatga 1 0500 
agaaaatgaaggcagcgcagtgtgatggagagtgacccaaggtgatacac: 10550 
tttgtacctctaaggaccagactgtgacccaggtcactcacagatgcccg 10600 
tcatgtgatgccacagcaacttttccaggtgctcgtttcctcccacttcc 10650 
cagtctcttgcccagccgcgactgcttacaaatacagctagaggaatcta 10700 
aatgaggttcctctatcatcaaacccaatcaaaatgccaaggaacagaat 10750 
cagtgcctggctgaaggcagtggaacagggccagcctggagtggttctct 10600 
ctgaggaagttcctcatcttggttttagggccataccttgtgacctgtga 10850 
gctaggggttgccagtccctgacatttctactgaggactcgcctgtctat 10900 
attcccggcctgtatgtgtctcctgagttccagacacacagggcgaagcg 10950 
cctgatggatggaagtatgttttttggtgttccattggtatctcaaattc 11000 
tacaaaacttagtgccccttctcctccctgttcctccccatcttcagtct 11050 
atcacctgttcctcatccagcaaatgatattaccatcttccaaggagctt 11100 
cccaggagtaatccttgactcctcctcaacatccaattaataatcaaatc 11150 
taggccaggtacaatagctcacgcctataatcccagcactttgggaggct 11200 
gaggcaggtggatcatttgaggccaggagttcaagaccagcctggccaac 11250 
aaggtgaaacctgtctcatttaaaaaaagttattttaaaaactcaaatct 11300 
attatttctacctctaagtgtgtcttgaatttatccatctctctccatct 11350 
ctgagctgttaccttacctcagtccatcacgttttgtctacgttaacatg 11400 
accagagtcttgttcttagtctggtgaggtcactccagctgcttcagatc 11450 
cttccatggctcaccgttgccctcatataaagttggcactcctggacatg 11500 
tggcttacggggccctccgtgatgtggccctatttgcttctccattctgt 11550 
tctctcccagcctctctgcccccatctctaggcaccaaccacacccttct 11600 
gctcgtcaatggtgccagcttctcttctatctctggtctttggacagact 11 650 
tttcccttcacctggaatgctttcttcaatcctaccccactctctttaat 117 00 
ctagataaggtttattctttttgaatgtctagcagtgaaaccatttcccc 11750 
tgaaaaaccttctctaaccaaccccctaccctcagcccaaggtctagatt 11800 
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aggagtccctctgaatgtttccatagcatttttaaagaattgcctattta 11850 
cttgttcgtatctatcactaaactacaaattgtatgagaacagccactat 11900 
ctctgcctggttcaccattcatctccagcaactagcataatgcctggcag 11950 
agtcagcctgcaacaaatatttgttgaataaattaacagatggctttatc 12000 
tccttaagtaaatcttgcttttttcacctattaaaacagacgcacaggcc 12050 
aggtgtggtggcccatgcctgtaatcccagcactttggcaggctgaggtg 12100 
ggcggatcacctgaggtcaggagttcaagaccagcctggccaacatggtg 12150 
aaaccccatctctaataaaaatacaaaaattagctgggcatggtggtggg 122 00 
tgcgtatagtcccagctactagggaggctgaggcaagagaatcgcttgaa 12250 
cccaggaggcaga^rgtggcagtgagccgagatcatgccactgtactccag 12300 
cctggatgacagagaccctgtctcaaaacacacacacacacacacacaca 12350 
cacacacacacacacacacacacacacaccaagttgtataatttaaaata 124 00 
taacgtgcttgttatggaacacttgtaaaatacaggaaagtaatgaaaaa 12450 
gtctaccatctagctcaccacataatgaccattgctatcatcctggcata 12500 
attctctcctgtatataaatatatattcttttattgttaaaattacacta - 12550 
tgagtactatttatttattttactgtggcaaaatgcgcaaaacataaaat 12600 
cttgccattttaaggtatgcagtttggtgcattcaccacactcacattgt 12650 
tgtgcaaatatcaccactatctatctcagaacttcttcgtcttcccaaac 127 00 
tgaaactctgtacccattaaacaatagtgcatcctctgttttcccctccc 12750 
tacaatttatttttatttgggtttgtaccaaactgaaaatagctgcttct 128 00 
tccttacttagttcagattagcatttccatttatttagccgtggttttga 12 8 50 
ggatgccatgacagatgccatccttcctagagctctttggggctgtcagg 12900 
tatttcagtcagggtgaattcgggttgataacattttaaaatctcacttt 12 950 
attctgaggttcctagtgtcagagcccaccgtatttttagggactcccaa 13000 
gttacaaacaaaaatatggtgaggaggaatcactgaagttttaacacaag 13050 
agacttacattttgttcaatttctatcttttagtttatttcctaagcata 13100 
aagaaatactttgaaaattttacatagcattatacatatttaattaagca 13150 
tgagcacatcttaaaactttaaattttagatcagatctttaattcctagg 13200 
atattaagaggtactggcaatttggccaggtgtggtggttcacgcctata 13250 
atcccaacactttgggagggtgaagtgggcgaattgctagagcccaggag 13300 
gtggaggctgcaatggcctgagatcacgccatcgtactccagcctggatg 13350 
atgagaatgaaatcctgtctcaaaaaaaaaaaaaaaaaaaaaaagaagaa 13 4 00 
gaagaagtattggcaatcagtgctccaggaataatttcctgacttgaaat 134 50 
aaacctacatgtagacaaactaattaggccattccaagagttgctagcat 13 500 
tggtttaatatgttttcagagcattccaggaagcagtgtggccagcattg 13550 
catgtttgatacttcagaaatgtatgacaggtgtttctcttacccaggtc 13600 
ttctgttttcttagttttgctcatgtaaatiatttatgaacatcctcatct 13€5': 
ttttgagggaagggattatagatcattctaattccattttctagcatttg 13700 
gtaccattctaagcacatgataggcacccatttggagcatttttggcttg 13750 
acagaatatgcatttagaattgttcaaattagaggtgtcagtgatgggaa 13800 
ttagaatactatataattctaagtcatttgacttaaatacaaaagaatga 13850 
ttttccttggtggggaatggtgaagggaggcaggagttaagaagaggaga 13900 
agagatcctaagtcatttataaacttctctggaaagacaggtgtgtgaag 13950 
actttttaaaaagtcattcaccaaattgtgtgtgtgtgtgtgtgtgtgtt 14 000 
ttaaatagactttattttttagagcagttttaggttcacagcaaaattga 14 050 
atgcaaggacagagatttcccataaaccccctgcccacacacatgcatag 14100 
cctccctcattatcaacatccccaccagagaggtgtttgttctagttgat 14150 
gaacctacactgacacatcattatcacccaaagtccatagttcacggcag 14200 
ggttcactgtcggtgtacattctatgggtttgagcaaatgtataatgaca 14250 
tgtatccaccattatagtaacatacagagtattttcagtgccctgcaaat 14 300 
cccctgttctccacctattcatccctccctctctgcatttccacccccag 14 350 
cccctggtaaccgctgatctttttactgtcccatagtttcggacgatcta 14 4 00 
tttttcagacagacacagagctgtctttcccttagtttctattctatcat 144 50 
ttctttctccccatccatcataaaaggctatgagttttttttaagtgttg 14500 
aacaccatcctacttgtcaagttaaaacataagctcctggctgggtacag 14550 
tggctcatgcctgtaatctcagcattttgggaggctgtggcagaagcatc 14600 
acttgaagccagaagtttgagaccagcctgggcaacatagcaagacccca 14 650 
tccctccacacacaaacacacacacacacacacacacacacacacacaca 14 700 
cacacacacacacaaaaacaagctcttgccagaattagagctacaaattg 147 50 
ccctcaggttcctagaagatcagtccttcaattagattcagattgagatg 14 800 
cttcctcttttaaacaatgattccctttctatcatgcccaataagaaaac 148 50 



wo 00/52178 



PCT/USOO/03542 



Fig. 16(contijTued) 21/34 

aaataaaaattaaacaatactgcctgtaatctcagctacccaggaggcag 14 900 

aagcagaactgcttcaacccggcaagcagaagttgcagtgaagtgagatc 14950 

gcgccactgcactccagcctgggaaacagagcaagattctgtctcaaaaa 15000 

caaaacaatgtgatttcctcctctaagtcctgcacagggaaatgttaaga 15050 

aataggtccaccaggaaagaaggaagtaagaatgtttgactagattgtct 15100 

tggaaaaaatagttatactttcttgcttgtcttcctaacagTTCTCCAAA 15150 

S P K 

GCTTCGTACCTTGGCCAGAGGCTTGTCTCCTGCGTACCTGAGGTTTGGTG 15200 
LRTLARGLS PAYLRFG 

gcaccaagacagacttcctaattttcgatcccaagaaggaatcaaccttt 15250 
gtktdflifdpk'kestf' 

GAAGAGAGAAGTTACTGGCAATCTCAAGTCAACCAGGgtgaa aa 1 1 1 tt a 15300 

EERSYWQSQVNQ 
aagattcactctatattttaattaacgtcagtccgtcatgagaatgcttt 15350 
gagaaaactgttatttctcacacctaacaattaatgagattaacttcctc . 15400 
tcccctcatctgacctgtggaggaatctgaacaagaggaggaggcagtgg 15450 
gcaggtttccttatcatgatgtttgtcatgttcagtgtgaggcctcacaa 15500 
aaaaaaaaaaaaaaaaaaaaggcgtcctggatataactgagagctcattg 15550 
tacagtaaatattaataaaacagtgattgtagctgaaggatagaactgct ISieoO 
tggagggagcaagtgggtagaatcgcgtcaaactaaagagcatttctagc 15650 
caaagacacaatgatagattgaaggatatttattctaaatatagaatatg 157 00 
ggtgaacgagatctgtggacttctgggctccaacgttagattctgatttt 15750 
agcaagcttgtcaggggattctgatattgaaaggctgtggccttcacctg 15600 
agaaacctgccctagggggccatgaaaatttgtcctgtctttcagaagtg 15850 
ctatcagacatcaaatggaagttaaatcgtatcttaacaattactaggat 15900 
gggcgcagtgactcacacctgtaatcccaacactttgggaggct.gaggca 15950 
ggaggatcacttgagcccaggagttcgggaccagcctgggcaacatagag 16000 
agacgttgtctctattttttaataatttaaagagaaaaaaatactgaaaa 16050 
tattgtatacaccactgaattataataatgtgtatataatgtatatattc 16100 
attatgaggaatatttgattatttcatatattatatcttttccttctgtt 16150 
tattttatccagttatgaagtatttagaacaattcatcagtaattggggc 162 00 
taaattgacagaatagtaatcagagaaaatagaaaaagacagatgggtta 16250 
tctttgaataccaggttggagttgtttatgggtttgttttttgttttggg 163O0 
ggcgtttttttagacagagtcccactctgttgcccaggctggagtgcagt 16350 
ggcacaagcatggcccactgcatccttgacctcttgggctcaagcaatct 164 00 
tcccaccttagcctcctgagtagctgggaccacaggtgcatgtcaccaca 164 50 
cccagcuaatttttttattttttgtagagacagtctttctatgttatcca 165C0 
ggctgatctcaaactcctgcactcaagtgatccccctgccttggcgtccc 16550 
aaagtattgggattataggcatagccaccacacccaacctagtttctatt 16600 
tagacttggccctttcccaccagtcatttgtgtccaaaagatctcataaa 16650 
tgtagacaggaaactgtcctttgctcatcagttttcttcatcctgtgtct 16700 
agggggatggtcggtgggggaaactggggttatgcaagttcctctgaaac 16750 
atcctctgtgagcccagggatggatgaggcaccagccgccagcgagtcag 16800 
tgtgcagctttccagaaaggaagtcatcagccagtcagccggccctggca 16650 
gccagcacccggcaaccctgctgtcttgtgataaagaaatggtctgcctg 16900 
acaggatggtgtggatttttcttttttcttttttttttttttgagacagg 16950 
gtctggctctgtcgcccaggctggagtgcaatggcgggatcttggctcac 17000 
tgcagcctctgcctcccaggctcaaggcatcctcccacctcggtctcccg 17050 
agtagctgggaccacaggcacacaccaccacgcccaactaagttttcgta 17100 
tttttagtagaggcagggttttactatgttgtccaggctagtctcaaact 17150 
cctgagctcaagctatccatctgccttggcctcccaaagagctggaatta 17200 
caagcgtgagccactgtgcctgaccagggtggattttttcaagtgcacat 17250 
gttgtggtcccagaagctctgatggtaccaaattccaagcgaaaaaaagt 17300 
caatggttcccacccatcctacctcccatgatggcaagaggaaat caeca 17350 
cactgcagatacagtccatgtaaaacaaattgctatggattttgaaagtg 17400 
aaccttaagagaactgcactatgttttcttcattagagttctctggtaat 17450 
ttccagctttttttttttttttttttagacagtgtctcgctttgtcgccc 17500 
agtgtcacccaggctggagtgcagtgacgtgatctcggctcactgcaacc 17550 
tccgcctcgtgggttgaagtgattctcctgcctcagcctcctgagtagct 17600 
gtattttagtagagacgaggtttcaccatttggccaggctggtctcgaac 17650 
tcctgacctcaagtgattcgcccatctcagcctcccaaagtgctgggatt 177 00 
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acaggtgtgagccactgcacccggccagtaatttcaagcttctgaggagc 177 50 
cctttgaattgttaaataacttgtagctatgtccaacatatccatgttca 17800 
gtgtatgttcgatatttcttaggaaacctgcccttggttgttttctttgt 178 50 
ggtaattcatgagccggcaaatttgacatgtgttacagaatatacctttt 17900 
ctctgctctcctacctcataaccagaacttaattatcctgctttagtcac 17950 
ataaatagctaactaaataaatatatgagatttcagtctgctcactgtga 18000 
aaatagaccttctaaatgatctcttccacttgcagATATTTGCAAATATG 18050 

DICKY 

GATCCATCCCTCCTGATGTGGAGGAGAAGTTACGGTTGGAATGGCCCTAC 18100 
GS I P P DVEE-KLRL E'WPY 
CAGGAGC7VATTGCTACTCCGAGAACACTACCAGAAAAAGTTCAAGAACAG 18150 

QEQLLLREHYQKKFKNS 
CACCTACTCAAgtaagaaatgaaaggcaccctagagatgttccagcccca 182 00 
T Y S 

aagatatttgaataggttggactcgggcaccaatctagcaagtcctacgg 18250 
aagttgtataaagctgaaaatactgaagcatttcccaaatgggaaatcct 18300 
aaactcaaaacttgctttttggtttttttgtttgtttgttttttcttcat 18350 
ctgacattgcttagtagtcacagaatgaaagataaatcaatcattcatga 184 00 
tctaacaa tgaccttcagtgctctaaaaaactacggagtcaaggaaaaca 184 50 
tgaatatattcctcatgtaaaattaaaatacagacatataaagggcaaaa 18500 
catgaacatcattcataccttgaggtccgtccccctcccagaaataaccc 18550 
ccagtatgccttggtttagagcattaagcaggagggccctgagtcactcc 18600 
agacagtcttgaccaccaagcagcattctctttttgtttcctctgtggct 18650 
tttgcaaacacagggctagctcagctacccattagtatgttttcagtcac 18700 
taaaacagtcttccagtcttcaaattaggatgacattgtcacatggggct 18750 
ttaaagcaagtgaaacaaggaaccccctttttttttttttttgagatgga 188 00 
atctcactcttgtcgcccagcctggagtgcaatggcgcaatcttggctca 18850 
ctgcaacctccacctcccaggttcaagagattctcctgccttagcctcct 18900 
attcattatgaggaatatttgattattcagttcctgtagggtaaagatat 18 9 50 
tacccccgatcatattattgattattgagtagctgagattacaggtgcct 19000 
gccaccacgaccggctaattttttgtattttttagtagagacagggtttc 19050 
accatgttggccaggctccaggctcgtctcgaactcctgacctcaggtga 19 IOC 
tccacccacctcagcctcccaaagttctgggattacaggcgtgagccacc 19150 
actcctggccacaatccttttttaactatgaaatatatttttatctgaag 19200 
tttgatgtttatacccaactgagggatgatgttcccatatctcagttaaa 19250 
gaaataacctgctcagatacttcaagctcttcttttgacttttgaaaata 19300 
aatgatcttgaagttactatactttgtttgggttagttaacattatttaa 19350 
agtatattattttaattaattatctttgtaagattttactgtatactacc 19 4 00 
tggagttcaatgtatcagatggatttcaaatttatgtacattttttatgt 194 50 
atatggtacagaaaaaaatgtgatccataagaaatcagaaaatagcgcat 19500 
atgctaatagctaatgttgtcctctaaaaaacttatttttgcatttttaa 19550 
gagggggatatactctgacactttaataagtgtaattaattattgactgg 1960 0 
aatttggcatgaggcagggccatttcagatcccattaaaggaatgacaca 19650 
taccagagaaccacagaagtaaggccacatttgtaataaatcattatagc 197 00 
tctgctaggagaagacccagttgtattaggtaattaatggatttgctctt 19750 
aaaacacatgtcccggaagatataggtgagtcttggggggccgcattaaa 198 00 
cattataccaatgtatcttacatttctaagaaagttttactactttacag 19850 
gatctttctgttaccaaaatggaaggtttccaactccaggacttggcttt 19900 
catagttcctacaccaggggaaatgccttcctttgctaactatgcaacca 19950 
ggttagttagtgtaagtccagccaccctgttggcaatgctaaaaggtaca 20000 
acaaacacagaattttatttgcatttgtaaacatttgatttctggctcga 20050 
aattttcagttttcatgggcacgtcatggaaacagaaatcttctgtgttt 20100 
agtttgggcacctactcattgtagtgacaaatatttcagaagccaatagg 20150 
ggattccacaaattgttctgaacctgtggctgagactggtaatggctgag 20200 
tgacatggggacataccacaaaagaagaggtagcaaaaggctgctgagat 20250 
aaggacatgttcattgcttagctagtggcctgcacccttaaaacacatgt 20300 
cccaggctgggtgctgtggctcacgcctgtaatcccagcactttgggagg 20350 
ctgaggcgggtggattacctgaggtcaggagttcgagaccaacctggcca 204 00 
acatagtgaaacctcatttctactaaaaatacaaaaattagccaggcatg 204 50 
gtggcgggcgcctgtagtcccagctactcaggaggcaggcaggagaatta 2 0500 
cttgaatctgggaggcagaggttgtggtgagccgagattgcgccaccgca 20550 
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cgctagcctgggcgacaaagtgagactctgtctcaaaaaaacaaaaacaa 20600 
aaaacaaacaaacaaaaaacaacaacaacaaaaaaacgggtatcccagaa 20650 
gatacaggtaagttttctaacacaggtcctcttgtatggtgcgttccact 20700 
taagtagaagatgacaaaaacatttgtcatgagaatatagactcacattt 20750 
taaacctgtttgagcaggaaaaggaagcaatgttacagatgtaattctgg 20800 
gtgtgactgcagaaaggatgactcccttattaaagtagtcatcctgagtg 20850 
agctaactctttgtacttcctcttctcctcctgttcccctcatcacccca 20900 
ttcttccgttgcctacacccaggcccacattggatgctgacatagactta 20950 
catggtacagtccaagggaaagatctgccatttttttcaatgtg.tcatct 21000 
tggttatcttcattccfaaggatctctccactttttatacagtaagagatg • 21050 
agagtctggaaaggattgggaataagataatgaattgtaagttttaaatt 21100 
gttcttcgtattttggggaaggagtaggctaggtggtccttctgtttttt 21150 
ttttgtttttttttttaaagtagatgtggccagacgtggtggctcacgcc 21200 
tgtaatcccagcactttgagaggctgaggcaggtggatcacttgatgtca 21250 
ggagttcaagaccagcctggccaacacagtgaaaccccgtctttactaaa 21300 
aatacaaaaactagccgggcttggtggcgtccacctgtagtcccagctac 21350 
tgcagaggtggaggcaggagaatcacttgaacccgggaggtggaggttgc 214 00 
agtgagccaagatcatgccattgtactccagcctgggcgacagaacaata 214 50 
ctctgtctcaaaaaaaaagagaaaagaaaagaaaaaaagaatggatttga 21500 
actcagtcgtcaatagcctctattccaggagatgttacagttgattatgt 21550 
tatagggggtgtataatagaatttcgagctatgtaaattccaagtgcatt 21600 
tggaagaatgaagaaatggaggaagggtaaagtatgagtgcaagcattcc 21650 
aggttttttgaaaatgctataatctttgttcagggctagtacaaagtgct 21700 
atttagctgtaagggttttttgtgatttacagacagttttcacatgtgtc 21750 
atttcaaccttggttttatggcgaaggcatgtgatggtgcttgtcccagg 21800 
actttagatccatatctgaggttcctgtcgggcaaagatattacccctga 21850 
tcatattatagtctataagtgggagagttgtgcctggagctcaagtctta 21900 
tgatttctgatccagggcacttcctacaacatgattttgcaatataaaag 21950 
cctataatgtgtgactaaagcaggtcactcaccccttgtaacagactcta 22 000 
gtaatggtactgccaccaaacggctgcgtgatattgggcaaagacttacc 22050 
ttatttgaatctcagtttcctccfcagaaaaatgagggtggaggttaagca 22100 
taggctgatgatcctaaagcctccatactgccctaaactgtggctctaag 2215C 
atccagtagaatgctgggrcacaggactctagggagcttttcaaacccaa 22200 
atgtctgtcattccttgatggtaggcagcagtttatggaagtgggcgaca 22250 
cagcaaatatcaaaatacctaaagcagcttgcaagagttgtttctgccta 22300 
gtggtctttatagttaatattaaatagttaattttttttttttttgagac 22 350 
agagtcttgctctgttacccaggctgcagtgcagtggcacaatctcggct 224 00 
cactgcaacctccacctcccgggtttgagcaattctgtctcagcctccca 224 50 
agtagctgggactacaggtgcatgccactgcacccagctaatttttgtat 22 500 
ttttagtagagacggggtttcaccatattgggcaggctggtctcgaactc 22550 
ttgacctcaggtgatccacctgcctcagcctcccaaagtgctgggattac 22600 
aggcatgagccactgcacccagcttaaatagctaatatttaatattattc 22650 
tatagttattcaagtaattcaggccaaagacttagaaacaaaacaaaaag 22700 
ccacttttaaggagaaagggtgtaagtttgccagatagatagagatcttt 22750 
cttttttaactacaagagttcaggaatgaattactctttaacaaacgact 22800 
atagatatacatgaaaattggaaggacttattatgcatatgataatcaat 22850 
ttaaagacaacacttaaaattatattgttgccactctcaaaaagtggtaa 22900 
tagaacagctaatggtttaaaaagcagagtacagaagttcccaaacttat 22950 
ggcaccttaatatcgcagaaaactttttaaagcatgcctaggcca caaaa 23000 
aatacctgtattttgattattaaattgtaaggtctacacaacctaatagt 23050 
aataggtccaatagtaatgctgtccaatagatgttgatgtttttttcctt 23100 
gcaaacttaaaagatcctacagtgcctctgtaaatagcactgcctggtta 23150 
gagttgaatttcagataaataatttttttcatgttaattatttttctttt 2 32 00 
ctttacttttttttttgtttttttgtttttttgtttttttttttgagaca 23250 
gggtctcattctgttgcccaggctgctgtgcaatggcatgatcatggctc 23300 
actgcagccttgacctccctgggctcaggtgatcctcccacctcagcctc 23350 
ccaagtagctagctgggactacaggtgcttaccatcatgcccggctaatt 234 00 
tttgtgttttttgtagagatgtggttttgccatgttgcccaggctggtct 234 50 
tgaactcctgggctcaagtgatccgcccgcctcggcctcccaaagtgcta 23500 
ggatgacaggcatgagccactgcacctggcccctgggcgaagtatttctt 23550 
aatggttacataggacatacactaaacattatttattgtctatatgaagt 2 3600 
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tcaagtttaactaggtgccctgcacttttagttgctaaatcctgtagctg 23650 
tacccatgcattcactggtgctccccagcttgccttgcacagagtttgga 23700 
aaccatagtcctataactctaggccaattttttaatgtaaaatttgattc 23750 
attttaaattaataaataataacaggaatttttttaaaaattgttttaaa 23800 
tataattaaaattatcaaaatattttttaactgaacttgtgactagagat 23850 
atttagattatgaagagtggggtttatgctaactaatgacagtctggcta 23900 
tgcatgtggagcactgagctataaattgtggcttccccaattctcctgat 23950 
gtcacttgaacaaaacctaagtgtcagaccagagcttctggtatcttcca 24 000 
tgggatttcattcaacagctggagcaaatgaagtcagattgatttttttt 24 050 
aatttgtccaattttgttg-tctcaaaaacataattataatcatttattag 24100 
aactagaatttcttcagtttaacaacagaaatagttattcattatgaaaa 24150 
gcgaatctggaggccttcattgtggtgccaatctaaccattaaattgtga 24200 
cgtttttcttttagGAAGCTCTGTAGATGTGCTATACACTTTTGCAAACT 24250 

RSSVDVLYTFAN 
GCTCAGGACTGGACTTGATCTTTGGCCTAAATGCGTTATTAAGAACAGCA 24300 
CS GLDLIFGLNALLRTA 
GATTTGCAGTGGAACAGTTCTAATGCTCAGTTGCTCCTGGACTACTGCTC 24350 

DLQWN SSNAQLLLDYCS 
TTCCAAGGGGTATAACATTTCTTGGGAACTAGGCAATGgtgagtacccca 244 00 

SKGYNISWELGN 
gggaacaattcattaataaggagattccccactagcattatttcttttct 24 4 50 
tttctttttcttttctttttttttttttttgagacagagtctcgcactgc 24 500 
tgcccaggctggagtgcagtggcgccacctcggctcacttgaagctctgc 24550 
ctcccaaaacgccattctcctgcctcagcctcccgagtagctgggactac 24 600 
aggcacccgccaccgcgcccggctaatttttttttttttttttttttttt 24 650 
tttttttgcatttttagtagagacggggtttcaccgtgttagccaggatg 24700 
gtcttgatctcctgacctcgtgatctgccctcctcggcctcccaaagtgc 24 750 
tgggattacaggcgtgagccaccaggcccggctagcattatttcttatga 24 800 
cactttttttttttttttgagacggagtctcgctctgtcgcccaggctgg 24 850 
agtgcagtggcgccatctcggctcactgcaagctccacctcccaggttca 24900 
cgccattctcctgcctcagcctcccgagtagctgggactacacgcacccg 24 950 
ccaccacgcccggctaatttttttgtatttttagtagagacggggtttca 25000 
ccgtgttagccaggatggtctctatatcctgaccccatgatctgcccgcc 2 5050 
tcggcctcccaaagtggtgggattacaggcgtgagccactgcgcccggcc 2 5100 
aacactctttttattattagcaaatatacttctgcctgggcacattcttg 2515 0 
caagtgctcaacaatgcaacttttggaagtgcatgtggcagaaactcctg 2 5200 
ct^tatttsttccagaacctattattgctaatccircagtttatgttacatt 2 5250 
tgaagtgagaaccagttggagccagcaacgttcccagctccaaagttccc 2 5300 
ttgagattttcagaatcacttaaccctattatgcttggcaacctggactc 2 5350 
agcaaaactgggaagtcagcagtttgttttattcatcccttcctttctca 254 00 
gtttctcaaatgtgtcagttaatctcagtaaccccattgcaaccttcatt 254 50 
acctgcccaagcggtctagaacttgccagtatagaatcctacgtgggtca 25500 
agctcctgactgtctccttcttcactctttttttgcaaagaacttgtaaa 25550 
ttttaactataagtattcatgattcgccacatttattcaaaacatagagt 25600 
gctttttccacatatcagccaatggaaataaggattaaatgggaaatgaa 2 5650 
atgtagtaataggataagcacaagtcttcttcctgctcaaactttttttt 25700 
ttttttttttcagacaagatcttgctctgttacccaggctggagtgcagt 2 57 50 
ggcgtgttcatagctcaatgtaacctccaactcctgggctcatgcaatct 25800 
ctcacacctcagccccctgattagctaggactacactatgcctagccaat 25850 
tttttttcttttgtctggttgtgttgcccaggctgtctcgatctcctggc 25900 
ctcaagtaatcctcctgcctcggccttctaaagtgctgggattataggca 25950 
tgagccactgtgcccggtctcaaacctttttttccaaagtaaatgaagtt 26000 
attagatatggaatatagtctagttcccagatatccatatccattggttt 26050 
attaccctcattattaacttcaaattgtttaatagaccctcatatctcag 26100 
ttatacagttaaaatttttgttttgtttttctggagtatcttatttataa 26150 
ctatgagttttactttacttatttattttattttttgagacagacgcttg 26200 
ctctgtcactcaggctggagtgcggttgcgtgatcatggctcactatggc 26250 
ctcgaccttctgggctcaagtgatcctctccctcagcctcccaagctgag 2 6300 
actacaggcatgcaccaccacatctagctaatttttttttttccccatgg 26350 
aacaaggctttactatgttacccagagtggtctcaaactcctggcctcag 26400 
gggatcctcctgtctcagcctaccaaaatgctgggattacaggcatgagc 2 64 50 
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catagcgccagacctggttttacttttcttgactttgaattacaagtttt 2 6500 

tgtaatttggaaaatgttttgttgcttttaaatactgctgtatgtttgct 26550 

tttaaatacaacatttctcgatatatattttgagaattgctgtctttcag 26600 

AACCTAACAGTTTCCTTAAGAAGGCTGATATTTTCATCAATGGGTCGCAG 26650 
EPNSFLKKADIFINGSQ 

TTAGGAGAAGATTTTATTCAATTGCATAAACTTCTAAGAAAGTCCACCTT 26700 

LGEDFIQLHKLLRKSTF 

CAAAAATGCAAAACTCTATGGTCCTGATGTTGGTCAGCCTCGAAGAAAGA 2 67 50 

KNAKLYGPDVGQPRRK 

CGGCTAAGATGCTGAAGAGgtaggaactagaggatgcagaatcactttac 26800 
T A K M L K S 

ttttcttctttttccttttgagacagagtctcactctgtcagccagactg 26850 

gagtgcagtggtacaatcatggctcactgcaacttcgacctcccaggctc 26900 

aagcaatcctcccatctcagtcccacaaatagctgggactacaggtgcac 26950 

atcaccacacctggctactttaaaaaaatttttttgtagagatggggtct 27000 

ccctgtgttgcccaggctggtctcttgaattcctgtgctcaagccatcct 27050 

tccacctcagcctcccagagtgccaggattacaggcatgagccaccacac 27100 

ccagccaccacttttcttaaaaaaaaaaaaagattctctctggtagacaa 27 150 

tcctcaatagtccacatgttattaaacaatctgctgcctgaatacatgat 27200 

ttaccaaaaaaaggaaattttgacgggttcagaatatcaagggatctgag 27250 

gcaaatgtcacctatgataaaatttgctatcaaaattaggaagtttgtgt 27 300 

ttacctgatcctaaagcagtaaccagcccatttctagggaataaaactct 27350 

catgcgtatattgtgcatatatatgtattatatgactgagtgataataaa 274 00 

attttttttct a gCTTCCTGAAGGCTG GTGGAGAAGTGATTGATTCAGTT 274 50 

FLKAGGEVI DSV 

ACATGGCATCAgtaagtatgtctcctattcttaatactaggaaagtaagg 27500 
T W H H 

ctagctttatttattacctagtattcaaaaagttagttcatttaactgcc 27550 

aattgactgcagttcaaataagaaacaaatagtgtctcaagtagcactgt 27 600 

actccaattttaatattaataaaaaaaattttaagttattttaaataatg 27650 

tagtggtttctataaagatcactttatacagaagaacagtgccaattaac 27700 

ccatggaacatataagtagctaaaaccaattgcttgccaaagaaccagta 27 750 

acccaggagtacatgtccttgccactgtgttttttcaagacagagtaact 2 7 800 

gatttctagttacttgcatagaatggactcctcctcataactcccttcca 27850 

tcttggtctttccctagtagaacttctacctttttttagtaacaggtgag 27900 

tcggagaggtaagaaggagaataaggtcagcaattaacctaaaagcagaa 27 950 

agtaaaatttgttattttttttctgaatattttctgtgtaatttagCTAC 28000 

Y 

TATTTGAATGGACGGACTGCTACCAGGGAAGATTTTCTAAACCCTGATGT 28 050 

YLNGRTATREDFIN PDV 

ATTGGACATTTTTATTTCATCTGTGCAAAAAGTTTTCCAGg t a a tagtct 28100 

LDI FI SSVQ KVFQ 

ttttaaactttttaatgtaaaaccagaatccttattttatagtctagcta 28150 

gttctaaattctataggtatgtatatttacatgtttttctaattttagag 28200 

aacaagcactatgacttatccactgttagttttccccttagcattgggtc 28250 

ttaccccatgtacgtgattagaaatttgaaatatttccaatagcctttag 28300 

tagaattaactcacatagatgataagaatgggttggttcacttcatgttc 28350 

cttccacagcctactatttcaataaaagaaagtttcccaagacctaaatg 28400 

actatgaacatattttataactatataggaggggtgggtctaggaataca 284 50 

aagttttgaatgctgttaatcttcaacaccacagttgaaaccacaggtca 28500 

gcttttttgcaattaccatgga tact tttctgtt eta tagGTGGTTGAGA 28550 

V V E 

GCACCAGGCCTGGCAAGAAGGTCTGGTTAGGAGAAACAAGCTCTGCATAT 28600 
STRPGKKVWLGETSSAY 

GGAGGCGGAGCGCCCTTGCTATCCGACACCTTTGCAGCTGGCTTTATgtg 28650 

GGGAP LLSDTFAAGFM 

agtgaagcagcgctggccttaggggtcagagtgcagctcttctccatcct 28700 

tctattctgctgaaatagctccccagccaaaaagcagatcaaagaccgt t 287 50 

tcagtggctgagccccaaaattcatgccagattttgcaagaaaatgattt 28800 

actaaagcttgagggacatctttaacaagtgttccaaattaatcactata 28850 

aggatgaattgtttcagaaattttggcctttaattatggcccataaatat 28900 
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Fag. 16 {cxntnmiBd) 

gtcaagtagtccttactctaaagaagtacactgtaaaagaatgcatatag 28950 

ccggatatggtagttccctgtaatcccaatactttgggaggccaaggtgg 29000 

gaggattgcttgagcccaggagtttgaggctgcagtgagttatgatggtg 29050 

ccactgcactctagactgggcaacagagtgagactgtctttttttttccc 29100 

ctctgtcacccagactggagggcagtggcacgatctcacctcactgcaac 29150 

ctctgcctcccggattgaagcgattctcctgcctcagcgtcctgagtagc 29200 

tgggactacaggagtatcaccgcactgggctaatttttgtatttttagta 29250 

gagacggggttttgacatgttgcccaggctggtctgaaacccatgagctc 29300 

aagtgatctgcctacctcagccttccaaaatgctgggattacggacatga 29350 

' gcjtaccacgcccggecacaccctgtctcttaaa^aaaaaaaaaatgcaajg 2 94 00 

ttagagcatattacagctttgtctctcaggaggatacttagtgtatgtag 294 50 

ctataattcatagattcccaagaagtttagagcctaaagtatgaggtccc 29500 

accagaggggctatcattaaatttaaagatttgttaaatcatctcattgt 29550 

ccaacaccacaaacttgattgctttaaaatactggtttagttacatttag 29600 

taactctattagtgcttttaatctatactgctatatcctcacattgagat . 29650 

tttttttcttttctcttccatcttcattcttttttctctcatcctcattc 29700 

ttataagcctagaatacatcacaaatcctttatgcccatggaagcaagag 29750 

gaataaagaatggagatgtttgttttgccattaactaaagatctggggtg 29800 

tcggggagaagggggatagagaaggagaagtgggaagaggtgtccataat 29850 

agcttaggtgcaattctgcttattttacattttacccccgctgactgcca 29900 

ctttttcttcagccctcacacattgtttgtgcagggacctcataggacca 2 9950 

ggaattgtctatagaggtgggaatttgtctcaccctgaaagggatacctc 30000 

tagcatggtaatagtcttctaggatttgttatcatatggaaagatgtaaa 30050 

gggsigggattctgctgctgctgctgctgctgcatgcagttgccatttcat 3 100 

ttaaatgacttatttataattgatgacacttttctggcttcctgttaatt 30150 

cctccctcaaagatcaataaaccagaaccaggcatggtggcatgcacttg 30200 

tggtcctgtaaccacccaacaggttcaccttgcctgctgtctagatagag 30250 

ccaattatcaagacaggggaattgcaaaggagaaagagtaatttatgcag 30300 

agccagctgtgcaggagaccagagttt tat tat tact caaatcagtctcc 30350 

ccgaacattcgaggatcagagcttttaaggataatttggccggtaggggc 30400 

ttaggaagtggagagtgctggttggtcaggttggagatggaatcacaggg 30450 

agtggaagtgaggttttcttgctgtcttctgttcctggatgggatggcag 30500 

aactggttgggccagattaccggtctgggtggtctcaaatgatccaccca 3 0550 

gttcagggtctgcaagatatctcaagcactgatcttaggttttacaacag 30600 

tgatgttatccccaggaacaatttggggaggttcagactcttggagccag 3 0650 

aggctgcattatccctaaaccgtaatctctaatgttgtagctaatttgtt 30700 

egtcctgcaaaggtaga-ttigtccccaggcaagaagggggtctttt caga 3 0750 

aaagggctattatcatttttgtttcagagtcaaaccatgaactgaatttc 30800 

ttcccaaagttagttcagcctacacccaggaatgaagaaggacagcttaa 30650 

aggttagaagcaagatggagtcaatgaggtctgatctctttcactgccat 30900 

aatttcctcagttataatttttgcaaaggcggtttcagtcccagctactt 30950 

gggaggctgagacaggaggattaatggagcccaggagtttgaggttgcag 31000 

agagctatgatcacgccactgcactccagcctgggtgacagagtgagacc 31050 

ctgtctctaaataaataaataagtaaataaataaatacataaataaaatc 31100 

aagatggtgtgcaattagaattgagcgattttgtttccaaacctcaagaa 31150 

agcttggtcttgctc t g t ccca gGTGGCTGGATAAATTGGGCCTGTCAGC 31200 

WLDKLGLSA 

CCGAATGGGAATAGAAGTGGTGATGAGGCAAGTATTCTTTGGAGCAGGAA 31250 

RMGI EVVMRQVFFGAG 

ACTACCATTTAGTGGATGAAAACTTCGATCCTTTACCTgtaagtgaccat 31300 

NYHLVDENFDPLP 

tattttcctaattctagtggagtagattaaagtcaactcaggacctctgg 31350 

tgttaacctcctatgaacagtcagtcctctcagtaactagccaaatcatg 31400 

agatgatgaattagaaggagccttagatagcatccaatctaacatttttt 31450 

tgtgtgtttgaagagaagaaatcaagagctaggaataactttttaaaggt 31500 

aagccatttgcagtatagtgtggattttgtttaaaaggggataatttgaa 31550 

attttatgactcattatacaagacaaaataagttggattttcaaatgttt 51600 

tacaaagtaaatcaaagttataattgcctacagtacgcaaagcttcaaaa 31650 

cattttttatgttatgaaattgtaatttatttaaccttaaaatgagccag 317 00 

taccatgtgtttgcttaaaaatctcatgctaagaatttactatgttgtta 31750 

ataatcttcaagatatttatgaataaagtcttatttctaatccttcctcc 31800 
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aactgtatctggtgctaaatcaggaaatgtttcttcccaaaaagcctcgt 31850 
ggaagatctgtatgtctaaatatatgtcagggataatacagatgtagccc 31900 
tgcgaagcatgaccttgatttttatagtctaaaatgtcatttgcagatat 31950 
ctattttctaagaataattcctaaaagaattatttgaatgttgtaggaaa 32000 
gctaagaaattttgcaaagagcgtacgtgaaaatataagctaggcttttg 32050 
tggtttgtggatagacttcccaacaaaattgctttttatctatagtgatc 32100 
caagcttgtggaacatattagtcatctttttttagaaaattcttagaaaa 32150 
gtgatcttgcaaaaatggaatttatctttccccaagtatattctgtcatg 32200 
tatagagttaaactaagcatagtaatttcaccagacaaacattcaaaatc 32250 
tactcctgacctttttatctcatccaaattttcccagggcccagacataa "32300 
acctttgccttacgaactctttgtatatgcactaaatatgcttctccttc 32350 
aaggttctcagtcagctagaaaaatgtgcaagagtaaatggtacccttct 32400 
cacttgtagatccaagagaattagacttaaactcactctacatgtctgtg 32450 
actttattttatttgcatgacagtcctgtgaggtggcaaggcaggtatct 32500 
tggatccattttttagataaggaagttcaaattgagaagaggttgcatga 32550 
tttacaggaagccatactgtagtcctatgttactcttaaaaatcccattc 32600 
aaatcctgcttctgaggcctgcatactttctaccctaccagtcattgacc 32650 
catgcttatgtctcctttgaaaacattgattccactcttgtctccagtga 32700 
aaaagtggaatttaagcagagaaacaaaagccatttgtcttgttaagtct 3275 0 
actttccctctactttcaagaaggaaagttggggtatgtgttgaatggtg 32 800 
atttatttatttatttattattttaaaaattgatacaaggtcttactgta 32850 
ttgtgcaggctggtctcaaactcctgggctcaagtgatcatcccacctca 32900 
gcctcccagtgttgggattacagcatgaaccattgtgcccaccaccgatc 32950 
cgcagttttttaagaaaaacttttactatagaaaattttaatcatataca 33000 
aaatacagaggaaagtatatgaacccactttaggagactagaatatgcca 33 050 
ccccaaaatatgccactttggcataaggattatttcgagctaaaggcaac 33100 
tgggaagaaacacatagaagaaaagttctctgtccttctccatttgccta 33150 
aaagcaggacatgaatcttaaaagtccccctccttccctttctaccagga 33200 
aaaacaagagttaatcactgaagataacttcagacccttatcagtgtaga 33250 
gatggcactagaagaatctatattacatactcatttattttccttcccac 33300 
aacttgccaccccagagactaaaaatccttttcctttgtcatgtctcttg 33350 
tccaaaaatttgctctataagctggagttctaagccacctctttgagaat 33 4 00 
tacttgttccctggtattttctgttaacatacatgtattaatatacatgc 334 50 
taacaagcttctgtttgtttttctcctgttttctgtcttgttacagaggt 33500 
ccatcccaactaagaactaaagagtaggaggaaaatataatttcctcctg 33550 
catactttgatcttgtttaatccgtaacccttcccacttttcacctccta 33600 
cctant:agactactT:t:gaagcaaattT:cagat:at:ati:actt:t:atctat:aa 33650 
atatttcagtatgtgctaggtgtggtggctcacacctgtaatcccaacac 33700 
tttgggaagctgaggcaggaggatcacttgagcccaggagttcaagacca 33750 
gctacggcaacaaaaaatcaaaaacttatctgggcatggtggcacatgcc 33800 
tgtggtcccagctacatgagaggctgaggcaggaggatcgctttagccca 33850 
ggaggttgaggctgcagtaagctgcattcacaccactgcactccagcctg 33900 
ggtgacagagtaagaccatgtctcaaaaaaatacatattttagtatgtat 33950 
cctttttgtaaaaacacaatacttttatcatactttaaataataacaata 34 000 
attccttagtatcaccaaatattttgtcagtgtctcacattttccttatt 34 050 
gtctaaaatattgttgatagttattcaaatcagaatccaaacaaggtcca 34100 
tatattacatttggttgacaagtctcttaagtttgttcatctttaagttc 34150 
ttcctccctctctttcatctcttgtaatttattaatgtgaaaaaacaggt 34200 
aatttgttctatagtatttcctacattatagagtttgctacatttattcc 34250 
ctatgatatcatttagcatgttcctctgtcccctgtgtttcctgtaaact 34 300 
ggtagttatacctagaagcttgagtttattcaggtttttaattgtatttt 34 350 
ttttgcaagaattctttattatctgcttctggaagcacagaatgtctggt 34 4 00 
tgtgtctggttttgatcttgacagctactgatgaccattgcctaatccat 344 50 
tactttattggggtggggggaataaggttttaaaataaattttttttaaa 34500 
gatttttttaactgttattttgagacagtgtctcatttcgtttcccaggc 34 550 
tggagtgcagtggcacaatcacggctcactgcagccttgacctcctggga 34600 
tcaggtgatcttctcacctcagcctcctgggtacctggaactacaggtgc 34 650 
acaccaccacacctggctaattttttgtattttgtgtacagaaggggttt 34 700 
catcatgtttcccagactggtcttgaactcctgggttcaagtgatctacc 34 750 
cacttcagcttcccaaaatcctgggattacactttggccaccgtgcctgg 34 800 
cctaaatgaaattatttgtctctaaacagacagaagttttactttaaaaa 34 850 
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tttgtctttgtgtgtacatgtgtttgtgtatgtgtgtgtgtctaaaagtt 34 900 

tggctttgagctttgctttgaattcttggatgaacaataaccaagaatac 34 950 

ttaaactctgatcattcttgacagatatcccctacaggctatggcctttt 35000 

gaattgtgtcctccagtgataaaaagcagcaagcacgatactgctctcag 35050 

attcatggtggtcacatgtgaggtgaaaaaaaaaaaaaagatgaatccta 35100 

tttaaatgcccccaggataacagtgatactctttgtaggataactatttg 35150 

cttgccactggtttcattaaataaggacataagtaaagatctatttttgt 35200 

ctctttctccccaaccaccacaactagGATTATTGGCTATCTCTTCTGTT 35250 

DYWLSLLF 

CAAGAAATTGGTGGGCACCAAGGTGTTAA.a:GGCAAGCGTGCAAGGTTCAA 35300 

KKLVGTKVLMASVQGS 

AGAGAAGGAAGCTTCGAGTATACCTTCATTGCACAAACACTGACAAgtaa 35350 
K R. R. K L. R V y L H C T N,..T. D N 

gtatgaaacacaccctttaccaatcatcaagttttagtgggtaagcctgt 35400 

aactttactcaaacaccctgttgcatgtgtctatacattgcataagtata 35450 

ggcagttgcaatttagtaaagttttatacaacgattttattttattttat 35500 

ttttagaagaaaaatgctacttttgttgttgttgttttttgagacggggc 35550 

ctcgctcgtcacccaggctggagtgcagtggtgcaatctcagctcactgc 35600 

aacctccgcctcccgggttcaagtgattcttgaagaggagaacaataata 35650 

acaacaatattattttcaaaagttgtgaccgcagtttctggagttgagaa 35700 

gacatcgagatttttgtagcctcatactcttgctttaggtagcaaaaaat 357 50 

gttcctaaatctcaggaatattctctagataggtttcaatctatcattcc 35800 

tgataagatgatgctgaaatactaattctagccaaaaaagaccagctacc 35850 

atttccgattgttggggactgggaactctggatagtgaggaccccagtag 35900 

gaagtagcgaggggaatggtttgaatggataaattcataaaaaatgtcag 35950 

tagatttaattttcttatacatttcagtctttttataaggctaggaaaag 36000 

cccctgtttttatggtttataatttgaattcacatgaacccacaaaattt 36050 

gccttttaccttcctatgtctgaaaatggatagtctggctggcctcttaa 36100 

caacccagctggcagagctgtgaggatctcagtgtgctctagcccagaca 36150 

ttggtagcatgaacggcaacatttttaattgtgttttcaaaataggagca 362 00 

cactagcggtctaaaacgatcataaaagaaggatactaagagggcccact 36250 

gtcattatggatcctaatacttaggatgcattatggattgtcattatgga 36300 

tactaatacttaggatcacatttgtaattgagtttttaattgcttaaatt 36350 

agatacatatttctattaagttaacctctttgcttttagTCCAAGGTATA 364 00 

PRY 

AAGAAGGAGATTTAACTCTGTATGCCATAAACCTCCATAATGTCACC/^G 3 64 50 
Hk.. cL. Gl. . Li.. L_ T . Z. Y A 1 N i*..,xi K. . V T K 

TACTTGCGGTTACCCTATCCTTTTTCTAACAAGCAAGTGGATAAATACCT 36500 

YLRL PYPFSN KQVDKYL 

TCTAAGACCTTTGGGACCTCATGGATTACTTTCCAAgtaagcaatttccc 36550 

L.. R.. H. . L.. G..,P H G L L S K 

ttgttcattccaaactttcaataaatttattggtgtttatcagaatagag 36600 

agtttggacagggagcaaaagacaaagtcaactatatcaagttctaataa 36650 

ttcttaatattcaggaaatttatgtatgaatacttactaatatgagtata 36700 

actcatcctaagagtctaaagcaaaaggatgtgaacacaaactagcagtt 36750 

atcttagagaataagtttgcatttcaaaataacttgacatatcaagatcc 36800 

actcaacgcatttaaattatttactctaaaaagacataattcttggtaac 36850 

acattcactaaagcaaaatatacctttatataattgctatcaaaggtatg 36900 

tgggttggtataaaatatcataccatgtgagatcagtgtgattcctttac 36950 

agcattaatttttattggttagagtaagaaaaagaatagctagagtatat 37000 

ttcttaagtagattctcatacactttggtttcaaaaaccaattattgact 37050 

acatcttataaaagcctgtattcaatggagtgccaaaaaatgactatgag 37 100 

tcttaaagagttaggcatataaatattttaaggtttctgttcaatgtatg 37150 

ttggaaggagttcctttctcatgactattctcatattggagcataaaaag 37200 

agtttacaggcttggcgcagtggctcatgcctgtaatcccaatactttgg 37250 

gaagctgaagcaggcagatcacttcagcccaggagtttgagaccagcctg 37 300 

ggcaatatggcaaaactctctctacaaaatataccaaaattagccaggcg 37350 

tggtggtgcatgcctgtagtcccagctacttgggaagctgaggtgggagg 37 4 00 

attgcttgagcccaggggggtcatggctgcagtgagctgtgatggtgcct 374 50 

ctgtcacccagcctgggtgacagagtgagaccctgtctcaaaaaaataaa 37 500 

taaataaaaattaagagtttacaaaattct caeca tctcctcccatcttt 37 550 
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qcaaatgccacataagtgatgtgttccagqactattagcctcggaacctq 37600 
aggcagtacagtaagcacgctttctccaaagtcctgtcccccacagacaa 37 650 
acattatttacactgggtactgctcttttattttttcccctctatgcttt 37700 
attttactataactataatcatataacatgtaataggaaaaaggcagggt 37750 
cgggggagagatccagaagtcttcccaagagcctttccaacatagcctct 37800 
gtagacattttttctttcttctttttttttttttttttttttctgagaca 37850 
gagtctcactctgttgtccaggctagagtgcagtggcgtgatctaggctc 37 900 
actgcaacctccgcctcctgggttcaagcaattctcccacctcagcctcc 37 950 
ctagtagctgggattagaggcatgcatcaccacgcctggctaatttttgt 38000 
attttrtagtagagatgaggtttcaccatgtgggccaggctggtcttgaac "3 8 050 
tcctgacctcaagtgatccacctgccttagcctcccaaagtgctaggatt 38100 
acacgagtgagccaccgtgccctgcccctattacattctgatcacacatt 38150 
tcatgttttataattggaaaactggtgaaattatagacaatgttttgttc 36200 
ccctaaattctctttgatgagtatatattacttacactcttctgtcttta 38250 
aaattttgcaaaatagtatcctagataagtttatgagtgcacagtctgta 38300 
cgcttactcatattaatgacctcggagagttaaacaacagtcacctttaa 38350 
aaattattactatcattatcattatttttgaggcgggggtctcattctgt 384 00 
ctcccaggctggagagtagtggtgcggtcacagctcactgcagccaccgc 384 50 
tacctgggctcaagtgatccttcctcctcagccttctgagtagctgagac 38500 
cacaggcttatgctaccacacctggctaattttttaactttttgtagaga 38550 
cgatgtctcattatgttgcccaggctggtctcaaactcctaagctcaagt 36600 
gatcttcctcagcctcccaaagtgctgggattacaggcatgaaaaactgc 38650 
acccagccctaaaaattattagggtcctgcatagtaagactttaataaat 38700 
atttaaatgaacatctggtttttttaaaaaaaaaatagagacaaggtctc 38750 
actatattgcccaagctggtctcgaactcctggactcacgcaatcctgct 38800 
gccttagccgcccaaagtgctgggattacaggcatgacccacctcatctg 38 850 
ggctgagtgaacatatttttaacataaaggccgtattttatatttatctc 38900 
atacattttgcccagcatccccatttccgccgaatctgttgcttgctaat 38 950 
tccttccagcttcatttcatetgaaatttgacaaacatcttctatttctt 39000 
tgtcgtcatgttattgacttcagaatataaaataaaacactatacccaaa 39050 
ttaaaccccaccctcattgcccagcctgatgtgaaaataatcagcataca 39100 
ctaagcttacccttgatatatgtgtagcatcttttagataaatatacagc 39150 
tgattaagcaatatagcctgatggtataatatcttgcccatgtacctcat 39200 
cttatctccagcaggattaattcacagtgatcagatttacctttaaactt 392 50 
tgtagcaaaatatcctctccaaaagcatatctaaaacttttgtgtgtact 39300 
cttgcaagtttcttaatttcatgcagaacaggctcttaccactgttagct 39350 
ggagatattttcaagacctatttttgtttgtggtttcctgatgatggtca 394 00 
tggcatttcccccttcactccatctaaaaattgaggtgatacaggctttt 39450 
aaacaaaaccaactcatatagactgagtacaactgcaatgcaggcatgct 39500 
aacctctgctacaatcatgggcgtgctattgatatgtcttaagttacaga 39550 
acacagggctgagcgtctcattaggtcaaaatgtaaaccagtttttctgc 39600 
tcactgatgcttaatgaggacagggtgtgagagatttctttaaggaaaac 39650 
aaatatataataatgctacatggaaaaatatctaacattagagaattaag 39700 
taaataaactaatatactcacaccatggaatcttgtgcagacattaaaat 39750 
tatgtagtggatggatgtttaatggtgtgagaaaaagttaggatgtgctg 39800 
gggtggggggaagaatcaagttttaagaaaatacagtatacccatactta 39850 
agtaaaaaaaaaaaaaaaggtatgtacagtcatgtgttgcttaatgatgg 39900 
ggatacattccgagaaatgtgtcgataggtgatttcatccttgtgtgaac 39950 
atcatagagtgaacttacacaaacctagatggtctagcctactatgtatc 4 0000 
taggctatatgactagcctgttgctcctaggctacaaacctgtaaagcat 4 0050 
gttactgtagcgaatatacaaatacttaacacaatggcaagctatcattg 40100 
tgttaagtagttgtgtatctaaacatatctaaaacatagaaaactaatgt 4 0150 
gttgtgctacaatgttacaatgactatgacattgctaggcaataggaatt 40200 
ataattttatccttttatggaaccacacttatatatgcggtccatggtgg 4 0250 
accaaaacatccttatgtggcatatgactgtatacatgtacacaaaaaat 4 0300 
agatgaaagaatgaatatacatcaaaatatttaaaatggttataatgact 4 0350 
taggttacttttatttatcttagtaataataatgatgatagataatactt 404 00 
ttatagtgtttactatataaaagacactgttataagtgttctacatactt 4 04 50 
tacatgtattacctaaatgatataaatataactctgacagtaactaatct 4 0500 
tatacgttctcttttcttttttttttttttctttttttagacagaatctt 4 0550 
gctctaccaggctggagtgcagggtgcaatctcggctcactqcaacctcc 4 0600 
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gcctcccaggttcaaacgattctcatgtctcagcctcctgagtagctggg 4 0650 

actacaggcacacaccaccatgcccggctaatttttgtatttttgggtag 4 0700 

agatggagttttgccatgttggccaggctgatcttgaactcctggcctca 40750 

agtgatctgcctgcctcagcctcccaaagtgctgggattacaggtgtgaa 40600 

ccactgtgctcggcctaatcttacaagttttcaatatttaaagagtgcta 4 0850 

actttgttgacaatataaaacatatttgagaaaaagagatataagcatict 40900 

tatttagaattatgaaaatatcaatagacctacagccgactaaagctttt 40950 

cttcataagctcttgcctatattgattcgctcctgtgaatatgcattaat 41000 

ttgatttaaataataagtatgtataagaaataacacttttccttaatttt 41050 

taagaacgttoaacagtttttaatttgaattccaatagtgaaatacatag 41100 

aaaatataaaattttctgtagtttagccaaattgtttttgtttcaccaca 41150 

gcattctaccaaaatttcttaataacagtaagaaaatgaatgcatacctc 41200 

ctgcagggagaggggagttaggcagtttatgggcatagttacaagtgaga 4 1250 

aatttcattggctaccatttacgctaaattcataaaaactgcattcaatt 41300 

ctatatatctattttctttacataaaaaaggtttcaattattggccatta 41350 

aataaaatagccaccattccagaagttgtgtcatgtttatcctttttata 41400 

ccaccatcatattgcctattatatagattgtgtgtgttccattttctgta 41450 

atgggccagacagtaagtatttctggctttggagtccatatggtctctat 41500 

cataactactcatctctgccattgtagcttaaagattatctaggtcaaat 41550 

gcctaagtgatatagtgttgaaatacaagttatataatataggctgccac 41600 

aaaaaaaaatttatttggtctaaaaaagatttcatgacttttgtagcagc 41650 

atgggtggggcatgcaccacttggttaactcggtgtatctttctcctttg 41700 

cagATCTGTCCAACTCAATGGTCTAACTCTAAAGATGGTGGATGATCAAA 41750 
SVQLNGLTLKMVDDQ 

CCTTGCCACCTTTAATGGAAAAACCTCTCCGGCCAGGAAGTTCACTGGGC 41800 
TLPPLMEKPLRPGSSLG 

TTGCCAGCTTTCTCATATAGTTTTTTTGTGATAAGAAATGCCAAAGTTGC 41850 
LPAFS YS FFVI RNAKVA 

TGCTTGCATCTGAAAATAAAATATACTAGTCCTGACACTGaatttttcaa 41900 
A C I * 

gtatactaagagtaaagcaactcaagttataggaaaggaagcagatacct 4 1950 

tgcaaagcaactagtgggtgcttgagagacactgggacactgtcagtgct 4 2 000 

agattta'gcacagtattttgatctcgctaggtagaacactgctaataata 42050 

atagctaataataccttgttccaaatactgcttagcattttgcatgtttt 42100 

acttttatctaaagttttgtttt^ttttattatttatttatttatttatt 42150 

ttgagacagaatctctctctgtcacccaggctggagtgccatggtgcgat 42200 

cttggctcactgcaactttaagcaattctcctgcctcagcttcctgagta 42250 

gctgggattataggcgtgtgccaccacgcccagctactttctatattttt 4 2 300 

tgtagagatggagtttcgccatattggccaagctggtctcgaactcctgt 42350 

cctcgaactcctgtcctcaagtgatccacccgcctcagcctctcaaagtg 424 00 

ctgggattacaggtgtgagccaccacacccagcagtgttttatttttgag 424 50 

acagggtatcattctgttgcccaggcttgagtgcagtggtgcaatcatag 42500 

atcactgcagccttttaactcctgggctcaagtcatcctcctgcttagcc 42550 

tcccaagtagctaggaccacagacacatgccatcacacttggctattttt 4 2600 

aaaaaattttttgtagagatggggtctcgctatgttacccaaactggtcc 42650 

tgaactcctggactcaattgatcctcccaccttggccttccaggtgctgg 4 2700 

gatttctttgggagtacagcatggtacagcaggagatcatttgatgttac 42750 

ctctgtgcagtgttgctagtcagcgaaagactataatacctgtggggaca 42800 

gcgattagccaccacaaccagtctttatttaaagttattaaaaatggctg 42850 

ggcgcagtggctcacacctgtaatcctagcactttgggaggccgaggcag 42900 

atggatcacctgacgtgaggaatttgagaccagcctggccaacatggtga 42950 

aaccccatctctactaaaaaatacaaaaattagctgggtgtggtcctgta 43000 

gtcccagctacttgggaggctggggcaggagaattacttgaacccaggag 4 3050 

gcagaggttgcagtgagccgagattgtgccactgcactccagcctgggtg 43100 

acagagagagattccatctcaaaaaaacaagttattaaaaatgtatatga 4 3150 

atgctcctaatatggtcaggaagcaaggaagcgaaggatatattatgagt 4 3200 

tttaagaaggtgcttagctgtatatttatctttcaaaatgtattagaaga 4 3250 

ttttagaattctttccttcatgtgccatctctacaggcacccatcagaaa 43300 

aagcatactgccgttaccgtgaaactggttgtaaaagagaaactatctat 4 3350 

ttgcaccttaaaagacagctagattttgctgattttcttctttcggtttt 4 34 00 
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ctttgtcagcaataatatgtgagaggacagattgttagatatgatagtat 434 50 

aaaaaatggttaatgacaattcagaggcgaggagattctgtaaacttaaa 4 3500 

attactataaatgaaattgatttgtcaagaggataaattttagaaaacac 4 3550 

ccaataccttataactgtctgttaatgcttgctttttctctacctttctt 43600 

ccttgtttcagttgggaagcttttggctgcaagtaacagaaactcctaat 43650 

tcaaatggcttaagcaataaggaaatgtatattcccacataactagacgt 43700 

tcaaacaggccaggctccagcacttcagtacgtcaccagggatctgggtt 43750 

cttcccagctctctgctctgccatctttagcgctggcttcattctcagac 4 3800 

tctggtagcatgatggctgtagctgtttcatgggccccttcaaacctcat 4 3850 

agcaaccagaggaagaaaatgagccattttttgagtctccttcatagact 43900 

tgaataactctttttcagagcttctcacagcaaacctctcctcatgtctc 43950 

ctcatgtcttattgttcagaaatgggtaatgtggccatttcaccagtcac 4 4 000 

tgccaacaacaacgaggttcctataattgtctctgagtaaccctttggaa 44050 

tggagagggtgttggtcagtctacaaactgaacactgcagttctgcgctt 44100 

tttaccagtgaaaaaatgtaattattttcccctcttaaggattaatattc 44150 

ttcaaatgtatgcctgttatggatatagtatctttaaaattttttatttt 44 200 

aatagctttaggggtacacactttttgcttacaggggtgaattgtgtagt 44250 

ggtgaagactcggcttttaatgtacttgtcacctgagtgatgtacattgt 44300 

acccaataggtaatttttcatccattaccctccttccgccctcttccctt 4 4350 

ctgagtctccaacatcccttataccactgtgtatgttcttgtgtacctac 444 00 

agctaagcttccacttataagtgagaacatgcagtatttggttttccatt 444 50 

cctgagttacttcccttaggataacagcccccagttccgtccaagttgct 44 500 

gcaaaatacattattcttctttatggctgagtaatagtccatggtacata 44550 

tataccacattttctttatccacttatcagttgatggacacttaggttaa 44600 

ttccattcaatttcattcaatttaagtatatttgtaaggagctaaagctg 44650 

aaaattaaattttagatctttcaatactcttaaattttatatgtaagtgg 4 47 00 

tttttatattttcacatttgaaataaagtaatttttataaccttgatatt 4 4750 

gtatgactattcttttagtaatgtaaagcctacagactcctacatttgga 44 800 

accactagtgtgttgtttcaccccttgttatactatcaggatcctcga 4 4 898 
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Figure 17 

50 

hxiinan MLLRSKPALP PPlMLIiI.IA3P L6PLSPGALP RPAQAQDWD LDFFTQEPIJI 

mouse ML RLLLLWLWGP LGAIAQGAPA GTAPTDDWD LEFYTKRPIiR 

rat ' -LLLLWLWGR LRALTQGTPA GTAPTKEWD LEFYTKRLFQ 

100 

human LVSPSFLSVT IDANIATDPR FLILLGSPKI» RTLARGLSPA YLRFGGTKTD 
mouse SVSPSFLSIT IDASIATDPR EXTFIiGSPRI. RAIARGI^PA Y1^(FGGTKTD 

rat SVSPSFLSIT lOASIATDPR FltTEXSSPHL RALSRGI.SPA YWFGGTKTD 

150 

human FI.IFDPKKES TFCERSTHQS QVNQ|>ICKYG SIPFDVEEKL RI£fVFXQEQL 

mouse EXIFDPDKEP TSEERSTWKS QVNHDICRSE PVSAAVLRKL QVEWPFQ^LL 

rat FLIFDPNNEP TSEERSYKQS QDNNDICGSD RVSADVL 

20 0 

human LU^HYQKKF KNSTYSRSSV DVLYTFANCS GI^LIFGUtA LLRTADI4QHN 

mouse IXREQY^^EF KNSTYSRSSV DMLXSFAKCS GIJ3I.XFG1MA XXBTPDLRRN 

rat 

250 

human SSHAQIOiLDY CSSKGYNXSW CIXjNEPNSFL KKADIFIN6S QIi(SDYIQZ«H 

mouse SSNAQLLLDY CSSKGYNISW EIXaiEPNSFW KKAHILIDGL QLGEDFVELH 

rat ~- ^^^^^^ , — , , ^. 

300 

human KLLRKSTFKN AKLYGPDVGQ PRXWTAKHLK SFLKAGGEVT DSVTWHHYYL 

mouse KIJJQRSAFQN AKLYGPDIGQ PRGKTVKLLR SFLECAGGEVX DSLTWHHYYL 

rat 

350 

human NGRTATREDF I^TPDVT*DIFI SSVQKVFQfW ESTEIPGKKVW I<GCTSSAYGG 
mouse KGRIATKEDF LSSDAIJ3TFI LSVQKILKVT KEITPGKKVW LGCTSSAYGG 

rat — — 

400 

human GAPLLSDTFA AGFMWI<DKIiG I<SABMGZEW MRQVFFGAGS7 YHLVDEHFDP 
mouse GAPLLSNTFA AGFMirU3KIX3 I.SAQIfGZEW KRQVFFGAGST YKLVI^NFEF 

rat 

450 

human LPDYHLSLLF KKLVGTKVI^ ASVQGSKRRK IJIVYLHCTNT DNFRYKEG^L 
mouse LroYHI«SU:«F KKLVGPRVLL SRVKGPDRSK LRVYLHCTNV YHPRYQEGDIi 

rat , 

500 

human TLYAINUmV TKYUU^PYPF SNKC2fVDKYLL RPLGPHGLLS KSVQI.NGLTL 
mouse TLYVLNIiHKV TKHIjKVPPPL FRKPVDTYLL KPSGPDGLLS KSVQLtVGQIL 

rat L 

543 

human KMVDDQTLPP LMEKPI^KPGS SLGLPAFSYS FFVIRNAKVA ACI- 
mouse KMVDEQTLPA LTEKPLPAGS ALSI^AFSYG FFVIRNAKIA ACI- 
rat KMVDEQTXPA LTEKPLPAGS SLSVPAFSYG rFNa:RllAKlA ACI- 
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Figure 19 



I MIARSKPALPPPIiMLLLLGPLGPLS PGALPRPAQAQDWDLDFFTQEPLHLVSPSEXSVT | 60 
PHD I EBEEE HHH EEEE EEE | 

I IDAJnja-Dl^RFLIIJXJSPKLRTLARGLSPAYLRFGGTKTDFLI FDPKKESTPEERSTWOS | 120 
PHD IEEE EEEEE HHHHHH HHHHE EEEEE HHHHHHI 

I QVNQDICKyCS IPPDVEEKIJO^WP YQEQIJilJlEHYQKKFKNSTYSRSSVDVLYTEANCS I 180 
PHD IHHHHHHHH HHHHHHH HHHHHHHHHHHHHHHH EEEEEEEEEEEE I 

I GLDLI FGLNALLRTADLCJfWNS SNAQLLLD YCSS KGYNI SWELGNfe PNS FLKKRDI FINGS | 240 
PHD ! HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH EEEEE HHHHHHH EEEE 1 

t QLGED YI QLHKLLRKST FKNAKL YGPDVGQPRRKTAKMLKS FLKAGGEVI DSVTWHHYYL ( 300 
PHD I HHHHHHHHHHIfilHHHHH HHHHHHHHHHHHH EEEEEEEEEEE ( 

I NGRTATREDFIJIPDVI.DI FI SSVQKVFQ\A/ESTRPGKKVWLGETSSAYGGGAPI*I.SDTFA | 360 
PHD . i HHHHHHHHHHKEEEEEEE EEEEEE HHHHHHH | 

I AGFMWLDKLGLSARMGI EVyMRQVFFGAGNYHLVDEN FDP LPD YWLS LLFKKLVGTKVUt | 4 20 
PHD IHHHHHHHH HHHH HHHHHHHHHHH EEEEE HHHHHHHHHHHH EEEEE | 

I ASVQGSKRRKIJl\nnjHCTNTDNPRYKEGDLTLYAINXHNVTKYLRLP YPFSNK^ ! 4 80 

PHD IEEE E EEEEEEEE EEEEEE EEEEE HHHHHHHHI 

iRPLGPHGLLSKSVQLNGLTLKMVDDQTLPPLMEKFLRFGSSlA^LPAFSYSFFVIRNAi^ 540 
PHD IHH EEEEEEE EEEEE EEEEEEEE EE 1 



lACI I 
PHD i I 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Iris Pecker, Israel vlodavsky and Elena 

Feinstein 

(ii) TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A POLYPEPTIDE 

HAVING HEPARANASE ACTIVITY AND EXPRESSION 
OF SAME IN GENETICALLY MODIFIED CELLS 

Ciii) NUMBER OF SEQUENCES: 4 7 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Mark M. Friedman c/o Anthony Castorina 

(B) STREET: 2001 Jefferson Davis Highway, Suite 207 

(C) CITY: Arlington 

(D) STATE: Virginia 

(E) COUNTRY: United States of America 

<F) ZIP: 22202 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 1.44 megabyte, 3.5" microdisk 

(B) COMPUTER: Twinhead* Sliranote-8 90TX 

(C) OPERATING SYSTEM: MS DOS version 6.2, 

Windows version 3.11 

(D) SOFTWARE: Word for Windows version 2.0 converted to 

an ASCI file 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 



(A) 


APPLICATION 


NUMBER: 


08/922, 170 


(B) 


FILING DATE; 




2 SEP 1997 


(A) 


APPLICATION 


NUMBER: 


09/109,386 


(B) 


FILING DATE; 




10 JUL 1998 


(A) 


APPLICATION 


NUMBER: 


PCT/US98/17954 


(B) 


FILING DATE: 




31 AUG 1998 


(A) 


APPLICATION 


NUMBER: 


09/258,892 


(B) 


FILING DATE: 




1 MAR 1999 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Friedmara, Mark M. 

(B) REGISTRATION NUMBER: 33,883 

(C) REFERENCE /DOCKET NUMBER: 910/62 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 972-3-5625553 

(B) TELEFAX: 972-3-5625554 

(C) TELEX: 

(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 7 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
CCATCCTAAT ACGACTCACT ATAGGGC 2 7 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GTAGTGATGC CATGTAACTG AATC 24 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANOEONESS : single 

( D ) TOPOI*OGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
ACTCACTATA GGGCTCGAGC GGC 23 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GCATCTTAGC CGTCTTTCTT CG 22 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TTTTTTTTTT TTTTT 15 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 
TTCGATCCCA AGAAGGAATC AAC 23 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:7: 
GTAGTGATGC CATGTAACTG AATC 24 

(2) INFORMATION FOR SEQ ID NO : B : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:B: 
Tyr Gly Pro Asp Val Gly Gin Pro Arg 
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(2) INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 1721 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTAGAGCTTT CGACTCTCCG CTGCGCGGCA GCTGGCGGGG GGAGCAGCCA GGTGAGCCCA 6 0 
AGATGCTGCT GCGCTCGAAG CCTGCGCTGC CGCCGCCGCT GATGCTGCTG CTCCTGGGGC 120 
CGCTGGGTCC CCTCTCCCCT GGOGCCCTGC CCCGACCTGC GCAAGCACAG GACGTCGTGG 180 
ACCTGGACTT cTTCACCCAG GAGCCGCTGC ACCTGGTGAG CCCCTCGTTC CTGTCCGTCA 24 0 
CCATTGACGC CAACCTGGCC ACGGACCCGC GGTTCCTCAT CCTCCTGGGT TCTCCAAAGC 300 
TTCGTACCTT GGCCAGAGGC TTGTCTCCTG CGTACCTGAG GTTTGGTGGC ACCAAGACAG 360 
ACTTCCTAAT TTTCGATCCC AAGAAGGAAT CAACCTTTGA AGAGAGAAGT TACTGGCAAT 4 20 
CTCAAGTCAA CCAGGATATT TGCAAATATG GATCCATCCC TCCTGATGTG GAGGAGAAGT 4 80 
TACGGTTGGA ATGGCCCTAC CAGGAGCAAT TGCTACTCCG AGAACACTAC CAGAAAAAGT 54 0 
TCAAGAACAG CACCTACTCA AGAAGCTCTG TAGATGTGCT ATACACTTTT GCAAACTGCT 600 
CAGGACTGGA CTTGATCTTT GGCCTAAATG CGTTATTAAG AACAGCAGAT TTGCAGTGGA 660 
ACAGTTCTAA TGCTCAGTTG CTCCTGGACT ACTGCTCTTC CAAGGGGTAT AACATTTCTT 720 
GGGAACTAGG CAATGAACCT AACAGTTTCC TTAAGAAGGC TGATATTTTC ATCAATGGGT 780 
CGCAGTTAGG AGAAGATTAT ATTCAATTGC ATAAACTTCT AAGAAAGTCC ACCTTCAAAA 840 
ATGCAAAACT CTATGGTCCT GATGTTGGTC AGCCTCGAAG AAAGACGGCT AAGATGCTGA 900 
AGAGCTTCCT GAAGGCTGGT GGAGAAGTGA TTGATTCAGT TACATGGCAT CACTACTATT 960 
TGAATGGACG GACTGCTACC AGGGAAGATT TTCTAAACCC TGATGTATTG GACATTTTTA 1020 
TTTCATCTGT GCAAAAAGTT TTCCAGGTGG TTGAGAGCAC CAGGCCTGGC AAGAAGGTCT 1080 
GGTTAGGAGA AACAAGCTCT GCATATGGAG GCGGAGCGCC CTTGCTATCC GACACCTTTG 114 0 
CAGCTGGCTT TATGTGGCTG GATAAATTGG GCCTGTCAGC CCGAATGGGA ATAGAAGTGG 1200 
TGATGAGGCA AGTATTCTTT GGAGCAGGAA ACTACCATTT AGTGGATGAA AACTTCGATC 1260 
CTTTACCTGA TTATTGGCTA TCTCTTCTGT TCAAGAAATT GGTGGGCACC AAGGTGTTAA 1320 
TGGCAAGCGT GCAAGGTTCA AAGAGAAGGA AGCTTCGAGT ATACCTTCAT TGCACAAACA 1380 
CTGACAATCC AAGGTATAAA GAAGGAGATT TAACTCTGTA TGCCATAAAC CTCCATAACG 1440 
TCACCAAGTA CTTGCGGTTA CCCTATCCTT TTTCTAACAA GCAAGTGGAT AAATACCTTC ISOO 
TAAGACCTTT GGGACCTCAT GGATTACTTT CCAAATCTGT CCAACTCAAT GGTCTAACTC 1560 
TAAAGATGGT GGATGATCAA ACCTTGCCAC CTTTAATGOA AAAACCTCTC CGGCCAGGAA 162 0 
GTTCACTGGG CTTGCCAGCT TTCTCATATA GTTTTTTTGT GATAA6AAAT GCCAAAGTTG 1680 
CTGCTTGCAT CTGAAAATAA AATATACTAG TCCTGACACT G 1721 

(2) INFORMATION FOR SEQ ID NO: 10; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 543 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro Pro Leu Met Leu Leu 
5 10 15 

Leu Leu Gly Pro Leu Gly Pro heu Ser Pro Gly Ala Leu Pro Arg Pro 
20 25 30 

Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe Phe Thr Gin Glu Pro 
35 40 45 

Leu His Leu Val Ser Pro Ser Phe Leu Ser Val Thr lie Asp Ala Asn 
50 55 60 
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Leu Ala Thr Asp Pro Arg Phe Leu He Leu Leu Gly Ser Pro Lys Leu 
65 70 75 80 

Arg Thr Leu Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly 
85 90 95 

Thr Lys Thr Asp Phe Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe 
100 105 110 

Glu Glu Arg Ser Tyr Trp Gin Ser Gin Val Asa Gin Asp He Cys Lys 
lis 120 125 

Tyr Gly Ser He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp 
130 135 140 

Pro Tyr Gin Glu Gin Leu Leu Leu Arg Glu Hio Tyr Gin Lys Lys Phe 
145 ISO 155 160 

Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 
165 170 175 

Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu Leu 
180 185 190 

Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu Leu Leu 
195 200 205 

Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu Leu Gly Asn 
210 215 220 

Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe He Asn Gly Ser 
225 230 235 240 

Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys Leu Leu Arg Lys Ser 
245 250 255 

Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp Val Gly Gin Pro Arg 
260 265 270 

Arg Lys Thr Ala Lys Met Leu Lys Ser Phe Leu Lys Ala Gly Gly Glu 
275 280 285 

val He Asp Ser Val Thr Trp His His Tyr Tyr Leu Asn Gly Arg Thr 
290 295 300 

Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp Val Leu Asp He Phe He 
30S 310 315 320 

Ser Ser Val Gin Lys Val Phe Gin Val Val Glu Ser Thr Arg Pro Gly 
325 330 335 

Lys Lys Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala 
340 345 350 



Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
355 360 365 
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Leu Gly Leu Ser Ala Arg Met Gly He Glu Val Val Met Arg Gin Val 
370 375 380 

Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro 
385 390 395 400 

Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 
405 410 415 

Lys Val I^u Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu Ar^ 
420 425 430 

Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 
435 440 445 

Asp Leu Thr Leu Tyr Ala He Asn Leu His Asn Val Thr Lys Tyr Leu 
450 455 460 

Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 
465 470 475 480 

Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys Ser Val Gin Leu Asn 
485 490 495 

Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr Leu Pro Pro Leu Met 
500 505 510 

Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Phe Ser 
515 520 525 

Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys He 
530 535 540 543 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1721 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

( D) TOPOLOGY : 1 inear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

CT AGA GCT TTC GAC 14 

TCT CCG CTG CGC GGC AGC TGG CGG GGG GAG CAG CCA GGT GAG CCC AAG 62 

ATG CTG CTG CGC TCG AAG CCT GCG CTG CCG CCG CCG CTG ATG CTG CTG 110 
Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro Pro Leu Met Leu Leu 
5 10 15 

CTC CTG GGG CCG CTG GGT CCC CTC TCC CCT GGC GCC CTG CCC CGA CCT 158 
Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro Gly Ala Leu Pro Arg Pro 
20 25 30 

GCG CAA GCA CAG GAC GTC GTG GAC CTG GAC TTC TTC ACC CAG GAG CCG 206 
Ala Gin Ala Gin Asp Val Val Asp Leu Asp Phe Phe Thr Gin Glu Pro 
35 40 45 
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CTG CAC CTG GTG AGC CCC TCG TTC CTG TCC GTC ACC ATT GAC GCC AAC 254 
Leu His Leu Val Ser Pro Ser Phe Leu Ser Val Thr lie Asp Ala Asn 
50 55 60 

CTG GCC ACG GAC CCG CGG TTC CTC ATC CTC CTG GGT TCT CCA AAG CTT 3 02 
Leu Ala Thr Asp Pro Arg Phe Leu lie Leu Leu Gly Ser Pro Lys Leu 
65 70 75 BO 

CGT ACC TTG GCC AGA GGC TTG TCT CCT GCG TAC CTG AGG TTT GGT GGC 3 50 
Arg Thr Leu Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly 
85 90 95 

ACC AAG ACA GAC TTC CTA ATT TTC GAT CCC AAG AAG GAA TCA ACC TTT 398 
Thr Lya Thr Asp Phe Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe 
100 105 110 

GAA GAG AGA AGT TAC TGG CAA TCT CAA GTC AAC CAG GAT ATT TGC AAA 446 
Glu Glu Arg Ser Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys 
115 120 125 

TAT GGA TCC ATC CCT CCT GAT GTG GAG GAG AAG TTA CGG TTG GAA TGG 4 94 
Tyr Gly Ser He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp 
130 135 140 

CCC TAC CAG GAG CAA TTG CTA CTC CGA GAA CAC TAC CAG AAA AAG TTC 542 
Pro Tyr Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe 
145 150 155 160 

AAG TUVC AGC ACC TAC TCA AGA AGC TCT GTA GAT GTG CTA TAC ACT TTT 590 
Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 
165 170 175 

GCA AAC TGC TCA GGA CTG GAC TTG ATC TTT GGC CTA AAT GCG TTA TTA 638 
Ala Asn Cys Ser Gly Leu Asp Leu He Phe Gly Leu Asn Ala Leu Leu 
180 185 190 

AGA ACA GCA GAT TTG CAG TGG AAC AGT TCT AAT GCT CAG TTG CTC CTG 686 
Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu Leu Leu 
195 200 205 

GAC TAC TGC TCT TCC AAG GGG TAT AAC ATT TCT TGG GAA CTA GGC AAT 734 
Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He Ser Trp Glu Leu Gly Asn 
210 215 220 

GAA CCT AAC AGT TTC CTT AAG AAG GCT GAT ATT TTC ATC AAT GGG TCG 782 
Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe He Asn Gly Ser 
225 230 235 240 

CAG TTA GGA GAA GAT TAT ATT CAA TTG CAT AAA CTT CTA AGA AAG TCC 830 
Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys Leu Leu Arg Lys Ser 
245 250 255 



ACC TTC AAA AAT GCA AAA CTC TAT GGT CCT GAT GTT GGT CAG CCT CGA 878 
Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro Asp Val Gly Gin Pro Arg 
260 265 270 
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AGA AAG ACG GCT AAG ATG CTG AAG AGC TTC CTG AAG GCT GGT GGA GAA 926 
Arg Lys Thr Ala Lys Met Leu Lys Ser Phe Leu Lys Ala Gly Gly Glu 
275 260 285 

GTG ATT GAT TCA GTT ACA TGG CAT CAC TAC TAT TTG AAT GGA CGG ACT 974 
Val lie Asp Ser Val Thr Trp His His Tyr Tyr Leu Asn Gly Arg Thr 
290 295 300 

GCT ACC AGG GAA GAT TTT CTA AAC CCT GAT GTA TTG GAC ATT TTT ATT 1022 
Ala Thr Arg Glu Asp Phe Leu Asn Pro Asp Val Leu Asp lie Phe lie 
305 310 315 320 

TCA TCT GTG CAA AAA GTT TTC CAG GTG GTT GAG AGC ACC AGG CCT GGC 1070 
Ser Ser Val Gin Lys Val Phe Gin Val Val Glu Ser Thr Arg Pro Gly 
325 330 335 

AAG AAG GTC TGG TTA GGA GAA ACA AGC TCT GCA TAT GGA GGC GGA GCG 1118 
Lys Lys Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala 
340 345 350 

CCC TTG CTA TCC GAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 1166 
Pro Leu Leu Ser Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
355 360 365 

TTG GGC CTG TCA GCC CGA ATG GGA ATA GAA GTG GTG ATG AGG CAA GTA 1214 
Leu Gly Leu Ser Ala Arg Met Gly lie Glu Val Val Met Arg Gin Val 
370 375 380 

TTC TTT GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT 1262 
Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro 
385 390 395 400 

TTA CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 1310 
Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 
405 410 415 

AAG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT CGA 13 58 
Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu Arg 
420 425 430 

GTA TAC CTT CAT TGC ACA AAC ACT GAC AAT CCA AGG TAT AAA GAA GGA 14 06 
Val Tyr Leu His Cye Thr Asn Thr Asp Asn Pro Arg Tyr Lys Glu Gly 
435 440 445 

GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC AAG TAC TTG 14 54 
Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn Val Thr Lys Tyr Leu 
450 455 460 

CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT AAA TAC CTT CTA 1502 
Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp Lys Tyr Leu Leu 
465 470 475 480 

AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA TCT GTC CAA CTC AAT 1550 
Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys Ser Val Gin Leu Asn 
485 490 495 



GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA ACC TTG CCA CCT TTA ATG 1598 
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Gly Leu Thr Leu Lys Met Val Asp Asp Gin Thr Leu Pro Pro Leu Met 
500 505 510 

GAA AAA CCT CTC CGG CCA GGA AGT TCA CTG GGC TTG CCA GCT TTC TCA 1646 
Glu Lys Pro Leu Arg Pro Gly Ser Ser Leu Gly Leu Pro Ala Plie Ser 
515 520 525 

TAT AGT TTT TTT GTG ATA AGA AAT GCC AAA GTT GCT GCT TGC ATC TGA 1694 
Tyr Ser Phe Phe Val lie Arg Asn Ala Lys Val Ala Ala Cys lie 
530 535 540 543 

AAA TAA AAT ATA CTA GTC CTG ACA CTG 1721 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 624 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : doiible 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

CTGGCAAQAA GGTCTGGTTG GGAGAGACGA GCTCAGCTTA CGGTGGCGGT GCACCCTTGC 60 
TGTCCAACAC CTTTGCAGCT GGCTTTATGT GGCTGGATAA ATTGGGCCTG TCAGCCCAGA 120 
TGGGCATAGA AGTCGTGATG AGGCAGGTGT TCTTCGGAQC AGGCAACTAC CACTTAGTGG 180 
ATGAAAACTT TGAGCCTTTA CCTGATTACT GGCTCTCTCT TCTGTTCAAG AAACTGGTAG 24 0 
GTCCCAGGGT GTTACTGTCA AGAGTGAAAG GCCCAGACAG GAGCAAACTC CGAGTGTATC 30 0 
TCCACTGCAC TAACGTCTAT CACCCACGAT ATCAGGAAGG AGATCTAACT CTGTATGTCC 360 
TGAACCTCCA TAATGTCACC AAGCACTTGA AGGTACCGCC TCCGTTGTTC AGGAAACCAG 420 
TGGATACGTA CCTTCTGAAG CCTTCGGGGC CGGATGGATT ACTTTCCAAA TCTGTCCAAC 4 80 
TGAACGGTCA AATTCTGAAG ATGGTGGATG AGCAGACCCT GCCAGCTTTG ACAGAAAAAC 54 0 
CTCTCCCCGC AGGAAGTGCA CTAAGCCTGC CTGCCTTTTC CTATGGTTTT TTTGTCATAA 600 
GAAATGCCAA AATCGCTGCT TGTATATGAA AATAAAAGGC ATACGGTACC CCT6AGACAA 660 
AAGCCGAGGG GGGTGTTATT CATAAAACAA AACCCTAGTT TAGGAGGCCA CCTCCTTGCC 72 0 
GAGTTCCAGA GCTTCGGGAQ GGTGGGGTAC ACTTCAGTAT TACATTCAGT GTGGTGTTCT 780 
CTCTAAGAAG AATACTGCAG GTGGTQACAG TTAATAGCAC TGTG 824 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1899 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: double 

(D) TOPOLOGY: linear 

(Xij SEQUENCE DESCRIPTION: SEQ ID NO: 13 

GGGAAAGCGA GCAAGGAAGT AGGAGAGAGC CGGGCAGGCG GGGCGGGGTT GGATTGGGAG 60 

CAGTGGGAGG GATGCAGAAG AGGAGTGGGA GGGATGGAGG GCGCAGTGGG AGGGGTGAGG 120 

AGGCGTAACG GGGCGGAGGA AAGGAGAAAA 6GGCGCTGGG GCTCGGCGGG AGGAAGTGCT 180 

AGAGCTCTCG ACTCTCCGCT GCGCGGCAGC TGGCGGGGGG AGCAGCCAGG TGAGCCCAAG 24 0 

ATGCTGCTGC GCTCGAAGCC TGCGCTGCCG CCGCCGCTGA TGCTGCTGCT CCTGGGGCCG 300 

CTGGGTCCCC TCTCCCCTGG CGCCCTGCCC CGACCTGCGC AAGCACAGGA CGTCGTGGAC 3 60 

CTGGACTTCT TCACCCAGGA GCCGCTGCAC CTGGTGAGCC CCTCGTTCCT GTCCGTCACC 420 

ATTGACGCCA ACCTGGCCAC GGACCCGCGG TTCCTCATCC TCCTGGGTTC TCCAAAGCTT 4 BO 

CGTACCTTGG CCAGAGGCTT GTCTCCTGCG TACCTGAGGT TTGGTGGCAC CAAGACAGAC 540 

TTCCTAATTT TCGATCCCAA GAAGGAATCA ACCTTTOAAG AGAGAAGTTA CTGGCAATCT 600 

CAAGTCAACC AGGATATTTG CAAATATGGA TCCATCCCTC CTGATGTGGA GGAGAAGTTA 660 

CGGTTGGAAT GGCCCTACCA GGAGCAATTG CTACTCCGAG AACACTACCA GAAAAAGTTC 720 

AAGAACAGCA CCTACTCAAG AAGCTCTGTA GATGTGCTAT ACACTTTTGC AAACTGCTCA 780 
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GGACTGGACT TGATCTTTGG CCTAAATGCG TTATTAAGAA CAGCAGATTT GCAGTGGAAC 84 0 

AGTTCTAATG CTCAGTTGCT CCTGGACTAC TGCTCTTCCA AG6GGTATAA CATTTCTTGG 900 

GAACTAGGCA ATGAACCTAA CAGTTTCCTT AAGAAGGCTG ATATTTTCAT CAATGGGTCG 960 

CAGTTAGGAG AAGATTATAT TCAATTGCAT AAACTTCTAA GAAAGTCCAC CTTCAAAAAT 1020 

GCAAAACTCT ATGGTCCTGA TGTTGGTCAG CCTCGAAGAA AGACGGCTAA GATGCTGAAG 1080 

AGCTTCCTGA AGGCTGGTGG AGAAGTGATT GATTCAGTTA CATGGCATCA CTACTATTTG 114 0 

AATGGACGGA CTGCTACCAG GGAAGATTTT CTAAACCCTG ATGTATTGGA CATTTTTATT 1200 

TCATCTGTGC AAAAAGTTTT CCAGGTGGTT GAGAGCACCA GGCCTGGCAA GAAGGTCTGG 1260 

TTAGGAGAAA CAAGCTCTGC ATATGGAGGC GGAGCGCCCT TGCTATCCGA CACCTTTGCA 1320 

GCTGGCTTTA TGTGGCTGGA TAAATTGGGC CTGTCAGCCC GAATGGGAAT AGAAGTGGTG 1380 

ATGAGGCAAG TATTCTTTGG AGCAGGAAAC TACCATTTAG TGGATGAAAA CTTCGATCCT 1440 

TTACCTGATT ATTGGCTATC TCTTCTGTTC AAGAAATTGG TGGGCACCAA GGTX3TTAATG 1500 

GCAAGCGTGC AAGGTTCAAA 6AGAAGGAAG CTTCGAGTAT ACCTTCATTG CACAAACACT 1S60 

GACAATCCAA GGTATAAAGA AGGAGATTTA ACTCTGTATG CCATAAACCT CCATAACGTC 1620 

ACCAAGTACT TGCGGTTACC CTATCCTTTT TCTAACAAGC AAGTGGATAA ATACCTTCTA 1680 

AGACCTTTGG GACCTCATGG ATTACTTTCC AAATCTGTCC AACTCAATGG TCTAACTCTA 1740 

AAGATGGTGG ATGATCAAAC CTTGCCACCT TTAATGGAAA AACCTCTCCG GCCAGGAAGT 1800 

TCACTGGGCT TGCCAGCTTT CTCATATAGT TTTTTTGTGA TAAGAAATGC CAAAGTTGCT 1860 

GCTTGCATCT GAAAATAAAA TATACTAGTC CTGACACTG 1899 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 592 

(B) TYPE: amino acid 

(C) STRANDEDNESS : singl 

(D) TOPOLOGY! linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

Met Glu Gly Ala Val Gly Gly Val Arg Arg Arg Asn Gly Ala Glu 
S 10 IS 

Glu Arg Arg Lys Gly Arg Trp Gly Ser Ala Gly Gly Ser Ala Arg 
20 25 30 

Ala Leu Asp Ser Pro Leu Arg Gly Ser Trp Arg Gly Glu Gin Pro 
35 40 45 

Gly Glu Pro Lys Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro 
50 55 60 

Pro Leu Met Leu Leu Leu Leu Gly Pro Leu Gly Pro Leu Ser Pro 
65 70 75 

Gly Ala Leu Pro Arg Pro Ala Gin Ala Gin Asp Val Val Asp Leu 
80 85 90 

Asp Phe Phe Thr Gin Glu Pro Leu His Leu Val Ser Pro Ser Phe 
95 100 105 

Leu Ser Val Thr He Asp Ala Asn Leu Ala Thr Asp Pro Arg Phe 
110 lis 120 

Leu He Leu Leu Gly Ser Pro Lye Leu Arg Thr Leu Ala Arg Gly 
125 130 135 

Leu ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys Thr Asp Phe 
140 145 ISO 

Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe Glu Glu Arg Ser 
155 160 165 

Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys Tyr Gly Ser 
170 175 ISO 

He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp Pro Tyr 
185 190 195 

Gin Glu Gin X^u Leu Leu Arg Glu His Tyr Gin Lys Lys Phe Lys 
200 205 210 

Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val Leu Tyr Thr Phe 
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215 220 225 

Ala Aen Cys Ser Gly Leu Abp Deu He Phe Gly Leu Aen Ala Leu 
230 235 240 

Leu Arg Thr Ala Asp Leu Gin Trp Asn Ser Ser Asa Ala Gin Leu 
245 250 255 

Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn lie Ser Trp Glu 
260 265 270 

Leu Gly Asn Glu Pro Asn Ser Phe Leu Lys Lys Ala Asp He Phe 
275 280 285 

He Asn Gly Ser Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys 
290 295 300 

Leu Leu Arg Lys Ser Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro 
305 310 315 

Asp Val Gly Gin Pro Arg Arg Lys Thr Ala Lys Met Leu Lya Ser 
320 325 330 

Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Val Thr Trp His 
335 340 345 

His Tyr Tyr Leu Asn Gly Arg Thr Ala Thr Arg Glu Asp Phe Leu 
350 355 360 

Asn Pro Asp Val Leu Asp He Phe He Ser Ser Val Gin Lys Val 
365 370 375 

Phe Gin Val Val Glu Ser Thr Arg Pro Gly Lye Lys Val Trp Leu 
380 385- 390 

Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro Leu Leu Ser 
395 400 405 

Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lye Leu Gly Leu 
410 415 420 

Ser Ala Arg Met Gly He Glu Val Val Met Arg Gin Val Phe Phe 
425 430 435 

Gly Ala Gly Asn Tyr Hie Leu Val Asp Glu Asn Phe Asp Pro Leu 
440 445 450 

Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 

455 460 465 

Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 

470 475 480 

Arg Val Tyr Leu His Cys Thr Asn Thr Asp Asn Pro Arg Tyr Lys 

485 490 495 

Glu Gly Asp Leu Thr Leu Tyr Ala He Asn Leu His Asn Val Thr 

500 505 510 

Lys Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 

515 520 525 

Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 

530 535 540 

Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 

545 550 555 

Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 

560 565 570 

Leu Gly Leu Pro Ala Phe Ser Tyr Ser Phe Phe Val He Arg Aan 

575 580 585 

Ala Lys Val Ala Ala Cys He 

590 592 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1899 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
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(D) TOPOLOGY: linear 

(xi) SEOUENCE DESCRIPTION: SEQ ID NO: IS 

GGG 3 

AAA GCG AGC AAG GAA GTA GGA GAG AGC CGG GCA GGC GGG GCG GGG 4 8 
TTG GAT TGG GAG CAG TGG GAG GGA TGC AGA AGA GGA GTG GGA GGG 93 
ATG GAG GGC GCA GTG GGA GGG GTG AGG AGG CGT AAC GGG GCG GAG 138 
Met Glu Gly Ala Val Gly Gly Val Arg Arg Arg Asn Gly Ala Glu 
5 10 15 

GAA AGG AGA AAA GGG CGC TGG GGC TCG GCG GGA GGA AGT GCT AGA 163 
Glu Arg Arg Lya Gly Arg Trp Gly Ser Ala Gly Gly Ser Ala Arg 
20 25 30 

GCT CTC GAC TCT CCG CTG CGC GGC AGC TGG CGG GGG GAG CAG CCA 226 
Ala l.eu Asp Ser Pro Leu Arg Gly Ser Trp Arg Gly Glu Gin Pro 
35 40 45 

GGT GAG CCC AAG ATG CTG CTG CGC TCG AAG CCT GCG CTG CCG CCG 273 
Gly Glu Pro Lys Met Leu Leu Arg Ser Lys Pro Ala Leu Pro Pro 
50 55 60 

CCG CTG ATG CTG CTG CTC CTG GGG CCG CTG GGT CCC CTc TCC CCT 318 
Pro Leu Met Leu Leu Leu X*eu Gly Pro Leu Gly Pro Leu Ser Pro 
65 70 75 

GGC GCC CTG CCC CGA CCT GCG CAA GCA CAG GAC GTc GTG GAC CTG 363 
Gly Ala Leu Pro Arg Pro Ala Gin Ala Gin Asp Val Val Asp Leu 
80 B5 90 

GAC TTc TTC ACC CAG GAG CCG CTG CAC CTG GTG AGC CCC TCG TTC 408 
Asp Phe Phe Thr Gin Glu Pro Leu Hie Leu Val Ser Pro Ser Phe 
95 100 105 

CTG TCC GTC ACC ATT GAC GCC AAC CTG GCC ACG GAC CCG CGG TTC 4 53 
heu Ser Val Thr lie Asp Ala Asn Leu Ala Thr Asp Pro Arg Phe 
110 115 120 

CTC ATC CTC CTG GGT TCT CCA AAG CTT CGT ACC TTG GCC AGA GGC 4 98 
Leu He Leu Leu Gly Ser Pro Lys Leu Arg Thr Leu Ala Arg Gly 
125 130 135 

TTG TCT CCT GCG TAC CTG AGG TTT GGT GGC ACC AAG AGA GAC TTC 543 
Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lye Thr Asp Phe 
140 145 150 

CTA ATT TTC GAT CCC AAG AAG GAA TCA ACC TTT GAA GAG AGA AGT 58 8 
Leu He Phe Asp Pro Lys Lys Glu Ser Thr Phe Glu Glu Arg Ser 
155 160 165 

TAC TGG CAA TCT CAA GTC AAC CAG GAT ATT TGC AAA TAT GGA TCC 633 
Tyr Trp Gin Ser Gin Val Asn Gin Asp He Cys Lys Tyr Gly Ser 
170 175 180 



ATC CCT CCT GAT GTG GAG GAG AAG TTA CGG TTG GAA TGG CCC TAC 
He Pro Pro Asp Val Glu Glu Lys Leu Arg Leu Glu Trp Pro Tyr 
185 190 195 



678 
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CAG GAG CAA TTG CTA CTC CGA GAA CAC TAC CAG AAA AAG TTC AAG 723 

Gin Glu Gin Leu Leu Leu Arg Glu His Tyr Gin Lys Lys Phe Lya 
200 205 210 

AAC AGC ACC TAC TCA AGA AGC TCT GTA GAT GTG CTA TAC ACT TTT 768 
Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Val l^eu Tyr Thr Phe 
215 220 225 

GCA AAC TGC TCA GGA CTG GAC TTG ATC TTT GGC CTA AAT GCG TTA 813 
Ala Asn Cye Ser Gly Leu Asp Leu lie Phe Gly Leu Asn Ala Leu 
230 235 240 

TTA AGA ACA GCA GAT TTG CAG TGG AAC AGT TCT AAT GCT CAG TTG 858 
Leu Arg Thr Ala Aop Leu Gin Trp Asn Ser Ser Asn Ala Gin Leu 
245 250 255 

CTC CTG GAC TAC TGC TCT TCC AAG GGG TAT AAC ATT TCT TGG GAA 903 
Leu Leu Asp Tyr Cye Ser Ser Lys Gly Tyr Asn He Ser Trp Glu 
260 265 270 

CTA GGC AAT GAA CCT AAC AGT TTC CTT AAG AAG GCT GAT ATT TTC 94 8 
Leu Gly Asn Glu Pro Asn Ser Phe Leu Lya Lys Ala Asp He Phe 
275 280 285 

ATC AAT GGG TCG CAG TTA GGA GAA GAT TAT ATT CAA TTG CAT AAA 993 
He Asn Gly Ser Gin Leu Gly Glu Asp Tyr He Gin Leu His Lys 
290 295 300 

CTT CTA AGA AAG TCC ACC TTC AAA AAT GCA AAA CTC TAT GGT CCT 1038 
Leu Leu Arg Lys Ser Thr Phe Lys Asn Ala Lys Leu Tyr Gly Pro 
305 310 315 

GAT GTT GGT CAG CCT CGA AGA AAG ACG GCT AAG ATG CTG AAG AGC 1083 
Asp Val Gly Gin Pro Arg Arg Lys Thr Ala Lys Met Leu Lys Ser 
320 325 330 

TTC CTG AAG GCT GGT GGA GAA GTG ATT GAT TCA GTT ACA TGG CAT 1128 
Phe Leu Lys Ala Gly Gly Glu Val He Asp Ser Val Thr Trp His 
335 340 345 

CAC TAC TAT TTG AAT GGA CGG ACT GCT ACC AGG GAA GAT TTT CTA 1173 
His Tyr Tyr Leu Asn Gly Arg Thr Ala Thr Arg Glu Asp Phe Leu 
350 355 360 

AAC CCT GAT GTA TTG GAC ATT TTT ATT TCA TCT GTG CAA AAA GTT 1218 
Asn Pro Asp Val Leu Asp He Phe He Ser Ser Val Gin Lys Val 
365 370 375 

TTC CAG GTG GTT GAG AGC ACC AGG CCT GGC AAG AAG GTC TGG TTA 1263 
Phe Gin Val Val Glu Ser Thr Arg Pro Gly Lyo Lys Val Trp Leu 
380 385 390 

GGA GAA ACA AGC TCT GCA TAT GGA GGC GGA GCG CCC TTG CTA TCC 1308 
Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro Leu Leu Ser 
395 400 405 
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GAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA TTG GGC CTG 13 53 
Asp Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys Leu Gly Leu 
410 415 420 

TCA GCC CGA ATG GGA ATA gAA GTG GTG ATG AGG CAA GTA TTC TTT 13 98 
Ser Ala Arg Met Gly lie Glu Val Val Met Arg Gin Val Phe Phe 
42S 430 435 

GGA GCA GGA AAC TAC CAT TTA GTG GAT GAA AAC TTC GAT CCT TTA 144 3 
Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe Asp Pro Leu 
440 445 450 

CCT GAT TAT TGG CTA TCT CTT CTG TTC AAG AAA TTG GTG GGC ACC 14 88 
Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu Val Gly Thr 
455 460 465 

AAG GTG TTA ATG GCA AGC GTG CAA GGT TCA AAG AGA AGG AAG CTT 1533 
Lys Val Leu Met Ala Ser Val Gin Gly Ser Lys Arg Arg Lys Leu 
470 475 480 

CGA GTA TAC CTT CAT TGC ACA AAC ACT GAC AAT CCA AGG TAT AAA 15 78 
Arg Val Tyr Leu His Cya Thr Asn Thr Asp Asn Pro Arg Tyr Lys 
485 490 495 

GAA GGA GAT TTA ACT CTG TAT GCC ATA AAC CTC CAT AAC GTC ACC 1623 
Glu Gly Asp Leu Thr Leu Tyr Ala lie Asn Leu His Asn val Thr 
500 SOS 510 

AAG TAC TTG CGG TTA CCC TAT CCT TTT TCT AAC AAG CAA GTG GAT 1668 
Lya Tyr Leu Arg Leu Pro Tyr Pro Phe Ser Asn Lys Gin Val Asp 
515 520 525 

AAA TAC CTT CTA AGA CCT TTG GGA CCT CAT GGA TTA CTT TCC AAA 1713 
Lys Tyr Leu Leu Arg Pro Leu Gly Pro His Gly Leu Leu Ser Lys 
530 535 540 

TCT GTC CAA CTC AAT GGT CTA ACT CTA AAG ATG GTG GAT GAT CAA 1758 
Ser Val Gin Leu Asn Gly Leu Thr Leu Lys Met Val Asp Asp Gin 
545 550 555 

ACC TTG CCA CCT TTA ATG GAA AAA CCT CTC CGG CCA GGA AGT TCA 1803 
Thr Leu Pro Pro Leu Met Glu Lys Pro Leu Arg Pro Gly Ser Ser 
560 565 570 

CTG GGC TTG CCA GCT TTC TCA TAT AGT TTT TTT GTG ATA AGA AAT 184 8 
Leu Gly Leu Pro Ala Phe Ser Tyr Ser Phe Phe Val lie Arg Asn 
575 580 585 

GCC AAA GTT GCT GCT TGC ATC TGA AAA TAA AAT ATA CTA GTC CTG 18 93 
Ala Lys Val Ala Ala Cys lie 

590 592 

ACA CTG 1899 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 594 

(B) TYPE: nucleic acid 
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(C) STRANDEONESS : double 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 



ATTACTATAG GGCACGCGTG GTCGACGGCC 
TAAAGAATTT TGGGTGGTTG ATCTCTTTCC 
TTTTTTCAGG CAAAAGTAAA ATACCTGAGA 
GGCTGGCTCA AGTGACAAGC AAGTGTTTAT 
TCCATTGGAG GCTTTACTCG AGGGTCAGAG 
GGAGTCGGAA ACGCTGGGTT CCCACGAGAG 
TCCGGGATGC CCAGCGCTGC TCCCCGGGCG 
CCGGGCGCTT GGATCCCGGC CATCTCCGCA 
GTGAACGTGA CCGCCACCGG GGGGAAAGCG 
GGGGCGGGGT TGGATTGGGA GCAGTGGGAG 



CGGGCTGGTA TTGTCTTAAT GAGAAGTTGA 60 
AGCTGCAGTT TAGCGTATGC TGAGGCCAGA 120 
AACTGCCTGG CCAGAGGACA ATCAGATTTT 180 
AAGCTAGATG GGAGAGGAAG GGATGAATAC 24 0 
GGATACCCGG CGCCATCAGA ATGGGATCTG 300 
CGCGCAGAAC ACGTGCGTCA GGAAGCCTGG 360 
CTCCTCCCCG GGCGCTCCTC CCCAGGCCTC 420 
CCCTTCAAGT GGGTGTGGGT GATTTCGTAA 480 
AGCAAGGAAG TAGGAGAGAG CCGGGCAGGC 54 0 
GGATGCAGAA 6AGGAGTGGG AGGG 594 



(2) 



INFORMATION FOR SEQ ID N0:17: 



(i) 



(Xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY: 



SEQUENCE DESCRIPTION: 



21 

nucleic acid 

single 

linear 

SEQ ID NO: 17 



CCCCAGGAGC AGCAGCATCA G 21 



(2) 



INFORMATION FOR SEQ ID NO: 18: 



U) 



(Xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY: 



SEQUENCE DESCRIPTION: 
AGGCTTCGAG CGCAGCAGCA ' 



21 

nucleic acid 

single 

linear 

SEQ ID NO: 18 
' 21 



(2) 



INFORMATION FOR SEQ ID NO: 19: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



22 

nucleic acid 

single 

linear 

SEQ ID NO: 19 



GTAATACGAC TCACTATAGG GC 22 



(2) 



INFORMATION FOR SEQ ID NO: 20: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH : 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 
ACTATAGGGC ACGCGTGGT 19 



19 

nucleic acid 

single 

linear 

SEQ ID NO: 20 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
CTTGGGCTCA CCTGGCTGCT C 21 

(2) INFORMATION FOR SEQ ID NO; 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
GCATCTTAGC CGTCTTTCTT CG 22 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : S ingle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
GAGCAGCCAG GTGAGCCCAA GAT 23 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
TTCGATCCCA AGAAGGAATC AAC 23 

(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
AGCTCTGTAG ATGTGCTATA CAC 23 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
TCAGATGCAA GCAGCAACTT TGGC 24 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
GCATCTTAGC CGTCTTTCTT CG 22 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) liENGTH: 24 

{B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GTAGTGATGC CATGTAACTQ AATC 24 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS; 

{A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQXreNCE DESCRIPTION: SEQ ID NO: 30 
AGGCACCCTA GAGATGTTCC AG 22 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
GAAGATTTCT GTTTCCATGA CGTG 24 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
CCACACTGAA TGTAATACTG AAGTG 25 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CGAAGCTCTG GAACTCGGCA AG 22 



(2) 



INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 
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(B) 
(C) 
(D) 



TYPE: 

STRANDBDNESS : 
TOPOLOGY: 



(xi) 



SEQUENCE DESCRIPTION: 



17 

nucleic acid 

single 

linear 

SEQ ID NO: 34 



GCCAGCTGCA AAGGTGTTGG AC 22 



t2) 



INFORMATION FOR SEQ ID NO: 35: 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



23 

nucleic acid 

single 

1 inear 

SEQ ID NO: 35 



AACACCTGCC TCATCACGAC TTC 23 



(2) 



INFORMATION FOR SEQ ID NO: 36: 



(i) 



(Xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



22 

nucleic acid 

single 

linear 

SEQ ID NO: 36 



GCCAGGCTGG CGTCGATGGT GA 22 



(2) 



INFORMATION FOR SEQ ID NO : 3 7 : 



(i) 



(xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY: 



SEQUENCE DESCRIPTION: 



22 

nucleic acid 

single 

linear 

SEQ ID NO: 37 



GTCGATGGTG ATGGACAGGA AC 22 



(2) 



INFORMATION FOR SEQ ID NO: 38: 



(i) 



(Xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY : 



SEQUENCE DESCRIPTION: 



22 

nucleic acid 

single 

linear 

SEQ ID NO: 38 



GTAATACGAC TCACTATAGG GC 22 



(2) 



INFORMATION FOR SEQ ID NO: 39: 



(i) 



(Xi) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 
(D) 



LENGTH: 
TYPE: 

STRANDEDNESS : 
TOPOLOGY: 



SEQUENCE DESCRIPTION: 



19 

nucleic acid 

single 

linear 

SEQ ID NO: 39 



ACTATAGGGC ACGCGTGGT 19 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRZETTION: 5EQ ID NO: 40 
CCATCCTAAT ACGACTCACT ATAGGtSC 27 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
ACTCACTATA GGGCTCGAGC GGC 23 

(2) INFORMATION FOR SEQ ID NO; 42: 

(i) SEQUENCE CHARACTERISTICS: 



(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 

GGATCTTGGC TCACTGCAAT CTCTGCCTCC CATGCAATTC TTATGCATCA SO 

GCCTCCTGAG TAGCTTGGAT TATAGGTCTG CGCCACCACT CCTGGCTACA 100 

CCATGTTGCC CAGGCTGGTC TTGAACTCTT GGGCTCTAGT GATCCACCCG 150 

CCTTGGCCTC CCAAAGTGCT GGGATTACAG GTGTGAGCCA TCACACCCGG 200 

CCCCCCGTTT CCATATTAGT AACTCACATG TAGACCACAA GGATGCACTA 250 

TTTAGAAAAC TTGCAATGGT CCACTTTTCA AATCACCCAA ACATGTTAAA 300 

GAAATTGGTA TGACTGGGCA TGGCACAGTG GCTCATGCCT GCAATCCTAG 350 

CATTTTGTGA GGCTGAGACG GGCAGATCAC GAGGTCAGGA GATTGAGACC 4 00 

ATCCTGACAG ACATGGTGAA ATCCCATCTC TACTAAAAAT ACAAAACAAT 4 50 

TAGCCGGGGG TGATGGCAGG CCCCTGTAGT CCCAGCTACT CGGGAGGCTG 500 

AGGCAGGAGA ATGGCGTGAA TCCAGGAGGC AGAGCTT6CA GTGAGCCGAG 550 

ATGGTGCCAC TGCACTCCAG CCTGGGCGAC AGAGCGAGAC TCCGTCTCAA 600 

AAAAAAAAAA AAAGAAAGAA ATTGGTATGA CTGTTGACTC ACAACAGGAG 650 

TCAGGGGCAT GGGGTGGGGT GTAAGATTAA TGTCATGACA AATGTGGAAA 700 

AGAAACTTCT GTTTTTCCAA CTCCACGTCT GCTACCATAT TATTACACTC 750 

TTCTGGTAGT GTGGTGTTTA TGTGTGAATT TTTTTTCATA TGTATACAGT 800 

AATTGTAGGA TATGAACCTG ATTCTAGTTG CAAAACTCAC TATGAGCTTA 850 

GCTTTTAAGT TGCTTAAGAA TAGGTAGATC TATGCAAATA ATGATAATTA 900 

TTATTATTAT TTTAAGAGAG GGTCTCACTT TGTCACCCAG GCTGGAGTGC 950 

AGTGGTGTGA TTAAGGGTCA CTGCAACCTC CACCTCCCAG GCTCAAATAA 1000 

ACCTCCCACC TCAGCCTCCC CAGTAGCTGG AACCACAGGC ACGGGCCACC 1050 

ACGCCTGGCT AATTTTTTGT ATTTTTTGTA GAGATGGGGT TTCATCATGT 1100 

TGCCCAGGCT GTTCTTGAAT TCCTCGGCTC AAGCAATCCT CCCACCTTGG 1150 

CCTCCCAAAA TGCTGGCATC ACAGGCATGA TGGCATCACT GGCATCACAT 1200 

ACCATGCCTG GCCTGATTTA TGCAAATTAG ATATGCATTT CAAAATAATC 1250 

^ TATTTTTATT TGTTGCCTTA TTGGTGGTAC AATCTCAAGT GGAAAAATCT 1300 

AAGGGTTTTG GTGTTATTTG CTTACTCAAC CAATATTTAT TAGACTCTTA 1350 

CTAAGCACCA ACATGATCAC ATGCCTGAGC TATGGCTAGC ATAGCGTGTG 1400 

AGACAAACTT AATCTCTGTT TTGGTGGAGC ATATAATCTA GTAGATGAAG 1450 

CCAATGTTGA GCAACATCAC AATACTAACA AATTGAGGAT GCTACGAGAG 1500 

TGTCTAACAA ATTGAGGATG CTACGAGA6T GTCTAACAAA TTGAGGATGC 1550 

TATGAGAGTG TGTCATGGAG AGCTGCCTGG AGATTGAGAG AAAGCTTCCT 1600 

TGAGGGAAGT TACATTTCAG CTGAAACACA CTGCCATCTG CTCGAGGTTT 1650 

TGTAACTGCA TTCACATCCC GATTCTGACA CTTCACATCC CGATTCTGAC 1700 

ACTTCACCCA GTTACTGTCT CAGAGCTTGG GTCCGCATGT GTAAAACAAG 1750 

GACAGTATGC ACTTGGCAGG GTTGTGAGAA GGGAAGAGAA CACAAGTAAA 1800 

GCACCTGTAT CAGGCATACA GTAGGCACTA AGCGTGCGAT GCTTGCTATG 1850 

ATTATACATC AGTGTAAGCA TCAAGG7WUVA GCTGAAGAAA AGTCTGACCA 1900 

ACAGCGAAAG ATAAATGCGC AGAGGAGAAA TTTGGCAAAG GCTCCAAATT 1950 

CAGGGGCAGT CCGTACTCTA CACTTTGTAT GGGGGCTTCA GGTCCTGAGT 2 000 

TCCAGACATT GGAGCAACTA ACCCTTTAAG ATTGCTAAAT ATTGTCTTAA 2050 

TGAGAAGTTG ATAAAGAATT TTGGGTGGTT GATCTCTTTC CAGCTGCAGT 2100 

TTAGCGTATG CTGAGGCCAG ATTTTTTCAA GCAAAAGTAA AATACCTGAG 2150 

AAACTGCCTG GCCAGAGGAC AATCAGATTT TGGCTGGCTC AAGTGACAAG 2200 

CAAGTGTTTA TAAGCTAGAT GGGAGAGGAA GGGATGAATA CTCCATTGGA 2250 

GGTTTTACTC GAGGGTCAGA GGGATACCCG GCGCCATCAG AATGGGATCT 2300 

GGGAGTCGGA AACGCTGGGT TCCCACGAGA GCGCGCAGAA CACGTGCGTC 2350 

AGGAAGCCTG GTCCGGGATG CCCAGCGCTG CTCCCCGGGC GCTCCTCCCC 24 00 
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GGGCGCTCCT CCCCAGGCCT CCCGGGCGCT TGGATCCCGG CCATCTCCGC 2450 
ACCCTTCAAG TGGGTGTGGG TGATTTCGTA AGTGAACGTG ACCGCCACCG 2500 

AGGGGAAAGC GAGCAAGGAA GTAGGAGAGA GCCGGGCAGG COGGGCGGGG 2550 
TTGGATTGGG AGCAGTGGGA GGGATGCAGA AGAGGAGTGG GAGGGATGGA 2600 

GGGCGCAGTG GGAGGGGTGA GGAGGCGTAA CGGGGCGGAG GAAAGGAGAA 2650 

AAGGGCGCTG GGGCTCGGCG GGAGGAAGTG CTAGAGCTCT CGACTCTCCG 2700 

CTGCGCGGCA GCTGGCX3GGG GGAGCAGCCA GGTGAGCCCA AGATGCTGCT 2750 

GCGCTCGAAG CCTGCGCTGC CGCCGCCGCT GATGCTGCTG CTCCTGGGGC 2800 

CGCTGGGTCC CCTCTCCCCT GGCGCCCTGC CCCGACCTGC GCAAGCACAG 2850 

GACGTCGTGG ACCTGGACTT CTTCACCCAG GAGCCGCTGC ACCTGGTGAG 2 900 

CCCCTCGTTC CTGTCCGTCA CCATTGACGC CAACCTGGCC ACGGACCCGC 2950 

GGTTCCTCAT CCTCCTGGGG TAAGCGCCAG CCTCCTGGTC CTGTCCCCTT 3000 

TCCTGTCCTC CTGACACCTA TGTCTGCCCC GCXMCGGCT CTCCTTCTTT 3050 

TGCGCGGAAA CAACTTCACA CCGGAACCTC CCCGCCTGTC TCTCCCCACC 3100 

CCACTTCCCG CCTCTCATTC TCCCTCTCCC TCCCTTACTC TCAGACCCCA 3150 

AACCGCTTTT TGGGGGGTAT CATTTAAAAA ATAGATTTAG GGGTTACAAG 3200 

TGCAGTTCTG TTCCATGGGT ATATTGCATT GTGGTGGCAT CTGGGCTCTT 3250 

AGTGTAACTG TCACCOSAAT GTTGTACATT GTATCTAATA GGTAATTTCTT 3300 

CATCCCTCAT CCCTCTCCCA CCCTCCCACC TTTTGGAGTC TCCAGTGTCT 335 0 

ACTATTCCAC TAAGTCCATG TGTACACATT GTTTAGCGCC CACTCTAAAT 3400 

GAGCCTTTTT GTTTCATTCA TTCTGTAAGT GTTGAATAGG CACCACCTAA 34 50 

GGTCAGGTAT AAGTGGAAAT TTGAAAAAGA AACTGCCCAC TTGCCCCAGT 3500 

ACTTCCCTAG CCAAGAGGAG GGAAACCAGG CAGGTGCACC TGAAGGCCTG 3550 

TGAGTGCTTG ATTTGCTGTG CAGTGTAGGA CAAGTAAGAT TGTGCATAGC 3600 

CTTCTGTATT TAAGACTGTG TTAGGAAGAT TTCTCTTTCT TTTCTTTTCT 3650 

TTTTCTTTTT TCTTTTCTTT TTTTTTTTTA GGCAGATGAA AAGGGCGTCA 3700 

CAGAACAGGA ATAAAAATCT AAATATTCAA TAAATGAGAC CTAGGAGACT 3750 

ACTGCAGTGA CTTACAAAGT CCTAATAAAA AGATGTCTCT CCAAAATGGG 3800 

GCTGCAAAAT GTGGTGCTGC CTTATCAGCT CTAAGTTTTT TCCTTACCTG 3850 

AGAAAGAAGG AACCTGATGC AGGTTCAGGG CTCCTGCCCC ATGAATGCAG 3900 

GCTGACTCCA AGATGGGGAG CTACAGGGAC AATCCCAGGT CTTCTAGGCC 3950 

TCTTATTTAG GCCCTGGGAG CCTCCAGAQA TGGCCACATC TTCACCAGCC 4 000 

CAGATAGAGG GAAAGATCAC CATTATCTCA CCTCTGTGTC AAATACCTAG 4050 

ATGCTGTCCT CCCTGAGCCC ACACTATAGT TGCCAGCGCT AATTTAATGG 4100 

GTAGTGTACT GGTTAAGAGA TGGACAGACC ATCCTGGCTT GACTCTCAGC 4150 

TCTGGCAAAG ATGAGTGACT TGGTTTTTCC ATATCTCTTG GCCACACCAA 4 200 

CCTTGATTTC TTCAGCTGTA GAATGGAATT TCTCAAGCTT GCCTCAAGGA 4250 

TTATTGCCCG AGGATTTGAT GATATGGTAA GAGCTTCTCA GTGTTTGACC 4300 

CATAGTAAGT GTTTGACX3TT TCAAACGAAT TGTTTCTTTC TAGOACATGG 4350 

TGAGCATTTG GTAGCCATTC ACCGGTTTTC TGTTTCTTTG GATCATAGTT 44 00 

AACCTCTCCT TTTCCTTCTG GCACTACAAT TTTCTGGTGG GGAAGAATCC 44 50 

TTACTTTCTG CCCTTCCCCT TAAGGATAGG AAGCTGATAC TAGGCAGCAA 4 500 

CTAGTTGGGG GATAGGAAGA TTGTTCCAGA GAAATGCTGA ACCATAGGGC 4550 

TCCAGATCAC AGGACCCCAG TCTTAGCTTG CTGGGGTGTG GGGTGGGGGG 4600 

GGGCGGTTAC TGAACATGGG TATGAAGTAG ATGTCCATTT ACTGAAATGT 4650 

GAGGACCTGA GGCCTCTTCT ATTGCTGTAG CCAGCATATT CCCCAACCTC 4700 

TCCCCAAGAA AGGACAGATG GGGGTTCCCC CCTGGAGTAA CAGGTCCAAA 4750 

AGAAAAAACA TACAGTGGGA CTTCCAGGAT CTGGGCCTGA TCACCCAGCA 4800 

GTCAAGCTCC CCGCAATTGA CTAACACCCC CCTAACACGT AGAAATTCCA 4 850 

ATCTGCAATT TAGTGAGGAT GATACCTTTA TTCTTCTTAA ATACATCTCT 4 900 

TCATTTCCCA GAGCACCCTT TTTTCCCCTC CTCTGCACCT TTTTGTTAAA 4 950 

GACTGGAGTA TAATGAAATA CCAAGAGAGC ATAACATGTG ATACATAAAA 5000 

CTTTTTTTCT GGTTTACAAA ACAGTTCATT CTTGTCCATA CGTGCTTCTC 5050 

TCCAAGGCTG GCTGCTGTCT GTTCCAGCCC GCTTCGCTTG GAGAGGCCAT 5100 

CTGCCATACC TGCTCCCCAG ACX3CATCGAC AAGCACACCC AGAGTGTTAT 5150 

CTGCTAAGAC CTAAAAGAGG GAGGAACCCC CTCTCCTCAT CTAAGACCTA 5200 

GCTTCTAAAT TAGAGTGTGA GGGTCCATCT CCCCAGGAGG GGCACAGGGC 525 0 

CCAAACAGCC CAGCCATCTC AGAAGACAAC ACTAAGCTTT GTAGGGGTCC 5300 

ACAGTAGAGG AGAGTAAGAC GCCTGTTGTT TAATTTATTA CAGTTCCTCA 5350 

AAAGTGAAGA TGTGTGGGCG G6ATGGCAAG AGCTGAGCAG ACGAAAGCTG 54 00 

AAGGAATAAG GAAAGAGAGG AGGACACAAA CAGCTGACAC TTCCTCAGTT 54 50 

CTTGTCATTT GCCTGGCCCT GTTCTAAGCA CCTTCTAGGT ATTAATCCAT SSOO 

TTAGTCTTGG CTACAACACT GTGAGTAACT AGTTTTGTCA CCCCCATTTT 5550 

AAAAATGAAG AAAGTGAGGC TCAGGGAGGT TAAGTAACTT GGCCACAGTT 5600 

TGAAACTAGA CTCTGA TCAC ATGAGATAAT AGTGCCCATA AAAAGGGAAA 5650 

GCAGATTATA TTTTTTAAAG GAAAGAGAGT AGGATATGGT AGAAAAAGAT 5700 

TGTTTGGAAA GGAATTGAGA GATTGATATA ATGAAAAQAA GCATTCACAT 5750 

GAGAGTAACA GTATCAGGGC CCAAACCTTC ATCTAAGGTA CTTCAAAGAG 5800 

GCCTAAGCAA ACTTAGTCAC TGGCGTGGTT CTAGTCTCCA TGATGGCAAA 5 850 

TACATTGTGT ACAGCCCAAC TCCACACAAA ACTTAAATAC CAATGATAGA 5900 

GCAATCTAAA ATTTGAAAGA AAAAATCTTT CAATTTGTCG TCTTCCCAGA 5950 

GGGACTTAAT CAAGAAACCA ATCAAAATAC TTCCTAAGCC TAACTGTGTG 6000 

CAGAACTCCA AAGAGAGCCC AGCCCTAAAT CAACACTGTC CAATGGAAAT 6050 

ATAATATAAT GTGGGCCTCA TATGCAAGGT CATATGTAAT TTTAAATTTT 6100 

CTAGTAGCCA TATTAAAAAG GTAAAAAGAA ACAAGTGAAA TTAATTTTAA 6150 
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TAATTTTATT TAGTTCAATA GATCCAAAAT GTTTTCTCAG CATGTAATCA 6200 
ATATAAAAAT ATTAATGAGG TATTTATTAT TCCTTTTCTC AAACCAAGTC 62 SO 
TATTCTATAA TCTGGCGTGT ATTATTTACA GCACTTCTCA GACTATATTT 6300 
CTTTCTTTCT TTTTTTTTTC CGAGACAATT TTGCTCTTGT CACCCAAGCT 6350 
AGAGTACAAT GGCGTTACCT CGGCTCACTG CAACCTCCGC CTCCCGGGTT 6400 
CAAGTTATTC TCCTGCCTCA GTCTCCCAAG TAGCTGGGAC TAGAGGCATG 64 50 
CACCACCACG CCTGGCTAAT TGTGTATTTT TAGTAGAGAC AGGGTTTCAC 6500 
CATGTTGGCC AGGCTAATCT CAAACTCCTG AGCTCAGGTG ATATGCCCAC 6550 
CTCX3GCCTCC CAAAGTGTTG GGATTACAGG CGTGAGCCAC TGCACCCGGC 6600 
CTCAGATTAA CTATATTTCA AGCGTTCAGT AGCCACATGT AGCTAGTGCT 6650 
ATGGTAGTGG ACAGTACAGA TCTGCATTTC AATTAAGACA CGTATACAAG 6700 
CATAGTTCAC TAATGCACGG TAAAAAAAAG TATAGTGCTG AGTCGGTGGT 6750 
AGAAATCCTA AATACTGCAG AGCAAAAGTG GTACGAACAG CAATCTCAGT 6 800 
GATAATGCAA CCATGCTTGC TTTTCATTGC AATTTGCTTA TTTTCCTTCA 6850 
GCAAAGTTCA TCCATTTTTG CCAATTCAAT AAATATTTAC TGATAAAAAC 6900 
TTTCAATATT AGATTCTTGC ATCTTCATAG ACAGAGTTGC TTTTCACATT 6950 
TA6AAAATTA CTTATCAATG TTAAACACAC GTTTTQATAA CCAGTGTTGG 7000 
AAAGAGGTGC AGACTCCCCA TGTGCCTATT GATGGCAGAA ATATTCACAG 7050 
CCT^AAGGGAA ACAAAGGGCT GGGGACAATC ACACACCTCA TGTCTCCTAA 7100 
CTCCTGGGAA GTGCTGTCCC TCTGATTGAG CTCTTATTAT TGCCTTCCCC 7150 
ACTAACCCTG TCCACTGTGC CCTGGAGCCC TTTGCAGGGT TACCTGCTCT 7200 
GTCCTCCTCA CAGAATATCT CCTCTACCTC CTTGTCCAAG CTACAACTTG 7250 
GCTATTCTCT GATGACACTG TCTTCCCTGT AGCCCTTTTG AGTAATGGCT 7300 
GCATATTCTC CCATAGTCCA GTTCTTTTCC TGTTCTCCAG TCTGGCTTCT 7350 
GGATGACAGC CCACTAGTTT GAACTCCATA CTGCTATAGT TCAAGTCCCT 7400 
TTTGACTTGT TACCTTGGGC AAATTACCTC CTTTTGTTCA GGTTCCTTGT 74 50 
TTGTAAAATG ACGATAATAA TGCCATTTGC TTCAGTGGGT TATTTTGAAA 7500 
TTGAGTGAAA GAAGGCOGGT AGCTTCCCTA CACGCTCAGT GTAGACTAGC 7550 
CTGATGTGCA TTACGGGTGA TGCCATGACT CAGTGTGTTT TCCTCATCTC 7600 
CACATCTGGC TCTCATCCAG TGCTCCTGCT TACGGCACTC TGTCCCCCTC 7650 
TTACTTACTC CCCCTTATTA ACTGAAGACT GGCACTGATC TCACA6TTTC 7700 
CTCTCCACTT CCTAGTCTCA CCATCATCCT AGATGACTTC AAGTCACCTA 7750 

GATAAACTGT CTCAGTTTCT TCACTCACAT TTTTTTATAA CAGATAATGT 7800 

TACACTCTUIG TTGTAACAGA ACCAGCTTAT CCAGCTCATG AAATGTATGC 7850 

ATTTCATCTC AACTCTGTAT TCAGTGACAT CCTGTGGGTA TCTGGAAATC 7900 

AGCCATGGTG AGAATATTTA CCATGGAAAT TGGCAAATAC TAAAAAGCAG 7950 

AGCACCTTTT TTTCTGAGAG CCAGACCATA GCTCTTCTAC TCCATAGCAC 8000 

CCATCATAAC AATTTTTAAA TACCTCCACT GAACAGCTTC TTCCTCTCTC 8050 

TACTTCTTCC ATATCTGATT TGAGCTTCTT AATTTATCAT GTGAACCACT 8100 

CTTGTAATAA TAACCCCAAA TCCCTGTTCC ATTGTTCTTC CTGCTAAAAT 8150 

ACTAAACCTG GTTTAGTCCA ACCATATTTT CTCTCTTTGG AATCTACAGG 8200 

GTGGCCCAAA AACCTGGAAA TGGAAAAATA TTACTTATTA ATTTTAATGT 8250 

ATATTAATAA GCCATTTTAA TGCTTCATTT CCAGTCTCAG TGGCCACCCT 8300 

GTATAGCTGG GCTATTGAGC TCTTGCGGGA GGAGGGAGTG GACAGTCTCC 8350 

CAGCCACACA GACTGATGTT GCACCAAACA TTTTTTAGCT TCCAGACTTC 8400 

CCTGGCCCTT AGTGTTACCC TTAACTCTCC ATTTCTCTGC CTTTCACATT 84 50 

CTCTACTTTT TAAAAATCTC TGACTCCACC TTCACCTTAT CATTCTTAGC 8500 

ACATGACCAT ACTTCTGCTT CCCAAAGAAA ATGAGCAATT ACTTCCTTTT 8550 

CCTTTTCCTC CTGTCATCAA ATCTGCAGAC ATGTCATGCC TAAGTCCAGC 8600 

TTTCCTCXrrT TCTCTGATCT CAGTCTGCTT CTTCCATTTC TGCCCTGAAT 8650 

CCCGTCCCCT CCCCAACCCC CAAGGACTTC 6CTCTATCAG TCACCTCTTC 8700 

CCTCTCCTGT ATCTTCAACT CCTCCCATTT TACTGGCTTC TTCCTCAAGC 8750 

CTTTCCCCAA GCCTTTCCCA TCTCAATTAC CTCCTCGCAC ATGCCTCTGC 8 800 

AQAAACCACC CCGTTTCTTC CCTCCCCTCG GCAGCCTOTT CTTCCTGTTC 8850 

TGCCCTCATG ATGGCACCAT CATTGTGTCA CTAAAATCAA TCTCTCCGAC 8900 

ATCATCAATG GCCTTCCTTT GTTGGGAAAC CTAATAAACA CTTTATCTTA 8950 

TTTGGTCTTT GTTATGGGTT GAATGAGGTT ACCCCGAAAT CCATATTAGA 9000 

AGTCCTAACC CCCAGTACCT CAGAATGTGA CTTTATTTGG GAATAGGGTC 9050 

ATTGCAGACG TTATTAGTTA GGATGAGGTC ATACTGGAAT GTGATGGGCT 9100 

GCTTATCTAA TATGACTGAT GTCCTTATAA CAAGGAGAAA TTTGGAGACA 9150 

GACACGCACA TAGGGAGAAT ACCATGTGAT GACAGGAGTT ATGGAGTTGG 9200 

AGTCAAAAAG CTATGGGAAC TTAGGAGAAA GACCTGGAAC AAATCCTTTC 9250 

CTGCGCCTAG AGAGGQAGTA TGGCCCTGCC ACTACCTTGA ATTCAACGTT 9300 

TCGGCTTTTC AAAACTGTAA GACAATACAT TTCTGTTGTT CAAACCAATT 9350 

AGTTTGCAGT ACTCTQCGAC TGCAGCCCTA ACAAACTAAT ACAGTCTCTT 9400 

GGAGGCATTT GGCAAGGTTG ACAATGGAAG CACTTTCTTA CCCCTTTAGG 9450 

TCTGTCGCCT TTCTTGTTGG GGGGTGTTTT CTAACAATTC CTCTCCATCT 9500 

CTCTCTCTCT AGTTTGTCTT AAACATTGGT GTTCTTCAGA CTTCTGACCT 9550 

AGGCCTTCTT TTCACTTCAC ATATTCCCCT GGGTGGTCTC ACCCACTTCC 9600 

AGAAATTACT TAAATTACTG CTCATGCAGT ACTGTGCTGG AAACTGTTTA 9650 

ACAACTGGCT CTCTGGGAAG AGGGGAGACT GGTTGATGGT TTTTGCTGAT 9700 

TTCTGTGGTG TAAATACTCC CTCCATGGCC AATTCCAAAC TGCCAACAGT 9750 

TTAACAACTG GCTCACAAAT TTTCTCCAAA TTTAACATTT GGCTTTCACA 9800 

GGCCAACAAC GTGGTACAGC CAACTCCAGC ACACCTCTGC TTTTGTGTCA 9850 

GAGAGAAGTA ACTTATTTTT GTACAAAAGG TAAAATAAAA ACACCTGCAG 9900 
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GCCCCCTTTT TTTCCTTAAC AAACTGCTCT AGAAATAGAA TAGCTGAAGC 9950 
TTCTTTTATG CATTCATCTG TTATTTCCAT GTCACTGTGG TGGTGGGATT 10000 
ATTTTTCCTT TATTTTTCTT GTATATGGTT GAAATACTGT ACCTTTGATC 10050 
AGTTTTAGTT TTATGGCATG TTTTGCACCC ATATTAAATC TAGTTTTTGT 10100 
CAGAGGGCGT CAATATTATT TTCTCAAAAC AAGAAAATAT TTCATTGCAA 10150 
AGGAGACAAA CAAAAAGGTC CTTAATACCA AAACTTT6AA ATGTGATTTC 10200 
TTGTACTTGG CAGTGTCCAA GTGGTAAACC CAAACAGTAT TGGGTTTTCA 10250 
TTTTGTTCAG GAAAGTCTTT GTCTGGCAGC GACTTACCCT TACATCAGGC 10300 
GGGCCTTGCT CATTCATTCA CTTAAGTATT TATTAAACAC CAGCGGTGTG 10350 
CCAAGTACTT ATCTAGGTAT CGGGTAGATT CTGATAAGTC AGTCAGGTCC 10400 
CTGCTCTCAG GGAGCTTGCA GCAGAGATGG GGGCTGCAAT AGAGAGTAAG 10450 
CCAAGGAAAT GAAAAAGGAA GTTGATTTCA GAGAGTGATG AATGCTATGA 10500 
AGAAAATGAA GGCAGCGCAG TGTGATGGAG AGTGACCCAA GGTGGTACAG 10550 
TTTGTACCTC TAAGGACCAG ACTGTGACCC AGGTCACTCA CAGATGCCCG 10600 
TCATGTGATG CCACAGCAAC TTTTCCAGGT GCTCGTTTCC TCCCACTTCC 10650 
CAGTCTCTTG CCCAGCCGCG ACTGCTTACA AATACAGCTA GAGGAATCTA 10700 
AATGAGGTTC CTCTATCATC AAACCCAATC AAAATGCCAA GGAACAGAAT 10750 
CAGTGCCTGG CTGAAGGCAG TGGAACAGGG CCAGCCTGGA GTGGTTCTCT 10800 
CTGAGGAAGT TCCTCATCTT GGTTTTAGGG CCATACCTTG TGACCTQTGA 10850 
GCTAGGGGTT GCCAGTCCCT GACATTTCTA CTGAGGACTC GCCTGTCTAT 10900 
ATTCCCGGCC TGTATGTGTC TCCTGAGTTC CAGACACACA GGGCGAAGCXS 10950 
CCTGATGGAT GGAAGTATGT TTTTTGGTGT TCCATTGGTA TCTCAAATTC 11000 
TACAAAACTT AGTGCCCCTT CTCCTCCCTG TTCCTCCCCA TCTTCAGTCT 11050 
ATCACCTGTT CCTCATCCAG CAAATGATAT TACCATCTTC CAAGGAGCTT 11100 
CCCAGGAGTA ATCCTTGACT CCTCCTCAAC ATCCAATTAA TAATCAAATC 11150 
TAGGCCAGGT ACAATAGCTC ACGCCTATAA TCCCAGCACT TTGGGAGGCT 11200 
GAGGCAGGTG GATCATTTGA GGCCAGGAGT TCAAGACCAG CCTGGCCAAC 11250 
AAGGTGAAAC CTGTCTCATT TAAAAAAAGT TATTTTAAAA ACTCAAATCT 11300 
ATTATTTCTA CCTCTAAGTG TGTCTTGAAT TTATCCATCT CTCTCCATCT 11350 
CTGAGCTGTT ACCTTACCTC AGTCCATCAC GTTTTGTCTA CGTTAACATG 11400 
ACCAGAGTCT TGTTCTTAGT CTGGTGAGGT CACTCCAGCT GCTTCAGATC 11450 
CTTCCATGGC TCACCGTTGC CCTCATATAA AGTTGGCACT CCTGGACATG 11500 
TGGCTTACGG GGCCCTCCGT GATGTGGCCC TATTTGCTTC TCCATTCTGT 11550 
TCTCTCCCAG CCTCTCTGCC CCCATCTCTA GGCACCAACC ACACCCTTCT 11600 
GCT CGTCAAT GGTGCCAGCT TCTCTTCTAT CTCTGGTCTT TGGACAGACT 11650 
TTTCCCTTCA CCTGGAATGC TTTCTTCAAT CCTACCCCAC TCTCTTTAAT 11700 
CTAGATAAGG TTTATTCTTT TTGAATGTCT AGCAGTGAAA CCATTTCCCC 11750 
TGAAAAACCT TCTCTAACCA ACCCCCTACC CTCAGCCCAA GGTCTAGATT 11800 
AGGAGTCCCT CTGAATGTTT CCATAGCATT TTTAAAGAAT TGCXTTATTTA 11850 
CTTGTTCGTA TCTATCACTA AACTACAAAT TGTATGAGAA CAGCCACTAT 11900 
CTCTGCCTGG TTCACCATTC ATCTCCAGCA ACTAGCATAA TGCCTGGCAG 11950 
AGTCAGCCTG CAACAAATAT TTGTTGAATA AATTAACAGA TGGCTTTATC 12000 
TCCTTAAGTA AATCTTGCTT TTTTCACCTA TTAAAACAGA CGCACAGGCC 12050 
AGGTGTGGTG GCCCATGCCT GTAATCCCAG CACTTTGGCA GGCTGAGGTG 12100 
GGCGGATCAC CTGAGGTCAG GAGTTCAAGA CCAGCCTGGC CAACATGGTG 12150 
AAACCCCATC TCTAATAAAA ATACAAAAAT TAGCTGGGCA TGGTGGTGGG 12200 
TGCGTATAGT CCCAGCTACT AGGGAGGCTG AGGCAAGAGA ATCGCTTGAA 12250 
CCCAGGAGGC AGAGGTGGCA GTGAGCCGAG ATCATGCCAC TGTACTCCAG 12300 
CCTGGATGAC AGAGACCCTG TCTCAAAACA CAGACACACA CAGACACACA 123 50 
CACACACACA CAGACACACA CACACACACC AAGTTGTATA ATTTAAAATA 12400 
TAACGTGCTT GTTATGGAAC ACTTGTAAAA TACAGGAAAG TAATGAAAAA 124 5 0 
GTCTACCATC TAGCTCACCA CATAATGACC ATTGCTATCA TCCTGGCATA 12500 
ATTCTCTCCT QTATATAAAT ATATATTCTT TTATTGTTAA AATTACACTA 12550 
TGAGTACTAT TTATTTATTT TACTGTGGCA AAATGCGCAA AACATAAAAT 12600 
CTTGCCATTT TAAGGTATGC AGTTTGGTGC ATTCACCACA CTCACATTGT 12650 
TGTGCAAATA TCACCACTAT CTATCTCAGA ACTTCTTCGT CTTCCCAAAC 12700 
TGAAACTCTG TACCCATTAA ACAATAGTGC ATCCTCTGTT TTCCCCTCCC 12750 
TACAATTTAT TTTTATTTGG GTTTGTACCA AACTGAAAAT AGCTGCTTCT 12800 
TCCTTACTTA GTTCAGATTA GCATTTCCAT TTATTTAGCC GTGGTTTTGA 1285 0 
GGATGCCATG ACAGATGCCA TCCTTCCTAG AGCTCTTTGG GGCTGTCAGG 12900 
TATTTCAGTC AGGGTGAATT CGGGTTGATA ACATTTTAAA ATCTCACTTT 12950 
ATTCTGAGGT TCCTAGTGTC AGAGCCCACC GTATTTTTAG GGACTCCCAA 13000 
GTTACAAACA AAAATATGGT GAGGAGGAAT CACTGAAGTT TTAACACAAG 13050 
AGACTTACAT TTTGTTCAAT TTCTATCTTT TAGTTTATTT CCTAAGCATA 13100 
AAGAAATACT TTGAAAATTT TACATAGCAT TATACATATT TAATTAAGCA 13150 
TGAGCACATC TTAAAACTTT AAATTTTAGA TCAGATCTTT AATTCCTAGG 13200 
ATATTAAGAG GTACTGGCAA TTTGGCCAGG TGTGGTGGTT CACGCCTATA 13250 
ATCCCAACAC TTTGGGAGGG TGAAGTGGGC GAATTGCTAG AGCCCAGGAG 13300 
GTGGAG6CTG CAATGGCCTG AGATCACGCC ATCGTACTCC AGCCTGGATG 13350 
AT6AGAATGA AATCCTGTCT CAAAAAAAAA AAAAAAAAAA AAAAGAAGAA 134 00 
GAAGAAGTAT TGGCAATCAG TGCTCCAGGA ATAATTTCCT GACTTGAAAT 13450 
AAACCTACAT GTAGACAAAC TAATTAGGCC ATTCCAAGAG TTGCTAGCAT 13500 
TGGTTTAATA TGTTTTCAGA GCATTCCAGG AAGCAGTGTG GCCAGCATTG 13550 
CATGTTTGAT ACTTCAGAAA TGTATGACAG GTGTTTCTCT TACCCAGGTC 13600 
TTCTGTTTTC TTAGTTTTGC TCATGTAAAT ATTTATGAAC ATCCTCATCT 1365 0 
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TTTTGAGGGA AGGGATTATA GATCATTCTA ATTCCATTTT CTAGCATTTG 13700 
GTACCATTCT AAGCACATGA TAGGCACCCA TTTGGAGCAT TTTTGGCTTG 13750 
ACAGAATATG CATTTAGAAT TGTTCAAATT AGAGGTGTCA GTGATG6GAA 13800 
TTAGAATACT ATATAATTCT AAGTCATTTG ACTTAAATAC AAAAGAATCA 13850 
TTTTCCTTGG TGGGGAATGG TGAAGGGAGG CAGGAGTTAA GAAGAGGAGA 13900 
A GAGATC CTA AGTCATTTAT AAACTTCTCT GGAAAGACAG GTGTGTGAAG 13 950 
ACTTTTTAAA AAGTCATTCA CCAAATTGTG TGTGTGTGTG TGTGTGTCTT 14000 
TTAAATAGAC TTTATTTTTT AGAGCAGTTT TAGGTTCACA GCAAAATTGA 14 050 
ATGCAAGGAC AGAGATTTCC CATAAACCCC CTGCCCACAC ACATGCATAG 14100 
CCTCCCTCAT TATCAACATC CCCACCAGAG AGGTGTTTGT TCTAGTTGAT 14150 
GAACCTACAC TGACACATCA TTATCACCCA AAGTCCATAG TTCACGGCAG 14200 
GGTTCACTGT CGGTGTACAT TCTATGGGTT TGAGCAAATG TATAATGACA 14250 
TGTATCCACC ATTATAGTAA CATACAGAGT ATTTTCAGTG CCCTGCAAAT 14300 
CCCCTGTTCT CCACCTATTC ATCCCTCCCT CTCTGCATTT CCACCCCCAG 14350 
CCCCT GGTAA CCGCTGATCT TTTTACTGTC CCATAGTTTC GGACGATCTA 14400 
TTTTTCAGAC AGACACAGAG CTGTCTTTCC CTTAGTTTCT ATTCTATCAT 144 50 
TTCTTTCTCC CCATCCATCA TAAAAGGCTA TGAGTTTTTT TTAAGTGTTG 14500 
AACACCATCC TACTTGTCAA GTTAAAACAT AAGCTCCTGG CTGGGTACAG 14550 
TGGCTCATGC CTGTTkATCTC AGCATTTTGG GAGGCTGTGG CAGAAGCATC 14600 
ACTTGAAGCC AGAAGTTTGA GACCAGCCTG GGCAACATAG CAAGACCCCA 14 650 
TCCCTCCACA CACAAACACA CACACACACA CACACACACA CACACACACA 14700 
CACACACACA CACAAAAACA AGCTCTTGCC AGAATTA6AG CTACAAATTG 14750 
CCCTCAGGTT CXTTAGAAGAT CAGTCCTTCA ATTAGATTCA GATTGAGATG 14 800 
CTTTCCTCTTT TAAACAATGA TTCCCTTTCT ATCATGCCCA ATAAGAAAAC 14 850 
AAATAAAAAT TAAACAATAC TGCCTGTAAT CTCAGCTACC CAGGAGGCAG 14 900 
AAGCAGAACT GCTTCAACCC GGCAAGCAGA AGTTGCAGTG AAGTGAGATC 14 950 
GCX;CCACTGC ACTCCAGCCT GGGAAACAGA GCAAGATTCT GTCTCAAAAA 15000 
CAAAACAATG TGATTTCCTC CTCTAAGTCC TGCACAGGGA AATGTTAAGA 15050 
AATAGGTCCA CCAGGAAAGA AGGAAGTAAG AATGTTTGAC TAGATTGTCT 15100 
TGGAAAAAAT AGTTATACTT TCTTGCTTGT CTTCCTAACA GTTCTCCAAA 15150 
GCTTCGTACC TTGGCCAGAG GCTTGTCTCC TGCGTACCTG AGGTTTGGTG 15200 
GCACCAAGAC AGACTTCCTA ATTTTCGATC CCAAGAAGGA ATCAACCTTT 15250 
GAAGAGAGAA GTTACTGGCA ATCTCAAGTC AACCAGGGTG AAAATTTTTA 15300 
AAGATTCACT CTATATTTTA ATTAAOGTCA GTCCGTCATG AGAATGCTTT 15350 
GAGAAAACTG TTATTTCTCA CACCTAACAA TTAATGAGAT TAACTTCCTC 15400 
TCCCCTCATC TGACCTGTGG AGGAATCTGA ACAAGAGGAG GAGGCAGTGG 154 50 
GCAGGTTTCC TTATCATGAT GTTTGTCATG TTCAGTGTGA GGCCTCACAA 15500 
AAAAAAAAAA AAAAAAAAAA GGCGTCCTGG ATATAACTGA GAGCTCATTG 15550 
TACAGTAAAT ATTAATAAAA CAGTGATTGT AGCTGAAGGA TAGAACTGCT 15600 
TGGAGGGAGC AAGTGGGTAG AATCGCGTCA AACTAAAGAG CATTTCTAGC 15650 
CAAAGACACA ATGATAGATT GAAGGATATT TATTCTAAAT ATAGAATATG 15700 
GGTGAACGAG ATCTGTGGAC TTCTGGGCTC CAACGTTAGA TTCTGATTTT 15750 
AGCAAGCTTG TCAGGGGATT CTGATATTGA AAGGCTGTGG CCTTCACCTG 15800 
AGAAACCTGC CCTAGGGGGC CATGAAAATT TGTCCTGTCT TTCAGAAGTG 15850 
CTATCAGACA TCAAATGGAA GTTAAATCX3T ATCTTAACAA TTACTAGGAT 15900 
GGGCGCAGTG ACTCACACCT GTAATCCCAA CACTTTGGGA GGCTGAGGCA 15950 
GGAGGATCAC TTGAGCCCAG GAGTTCGGGA CCAGCCTGGG CAACATAGAG 16000 
AGACGTTGTC TCTATTTTTT AATAATTTAA AGAGAAAAAA ATACTGAAAA 16050 
TATTGTATAC ACCACTGAAT TATAATAATG TGTATATAAT GTATATATTC 16100 
ATTATGAGGA ATATTTGATT ATTTCATATA TTATATCTTT TCCTTCTGTT 16150 
TATTTTATCC AGTTATGAAG TATTTAGAAC AATTCATCAG TAATTGGGGC 16200 
TAAATTGACA GAATAGTAAT CAGAGAAAAT AGAAAAAGAC AGATGGGTTA 16250 
TCTTTGAATA CGAGGTTGGA GTTGTTTATG GGTTTGTTTT TTGTTTTGGG 16300 
GGCGTTTTTT TAGACAGAGT CCCACTCTOT TGCCCAGGCT GGAGTGCAGT 163 50 
GGCACAAGCA TGGCCCACTG CATCCTTGAC CTCTTGGGCT CAAGCAATCT 16400 
TCCCACCTTA GCCTCCTGAG TAGCTGGGAC CACAGGTGCA TGTCACCACA 16450 
CCCAGCTAAT TTTTTTATTT TTTGTAGAGA CAGTCTTTCT ATGTTATCCA 16500 
GGCTGATCTC AAACTCCTGC ACTCAAGTGA TCCCCCTGCC TTGGCGTCCC 16550 
AAAGTATTGG GATTATAGGC ATAGCCACCA CACCCAACCT AGTTTCTATT 16600 
TAGACTTGGC CCTTTCCCAC CAGTCATTTG TGTCCAAAAG ATCTCATAAA 16650 
TGTAGACAGG AAACTGTCCT TT6CTCATCA GTTTTCTTCA TCCTGTGTCT 16700 
AGGGGGATGG TCGGTGGGGG AAACTGGGGT TATGCAAGTT CCTCTGAAAC 16750 
ATCCTCTGTG AGCCCAGGGA TGGATGAGGC ACCAGCCGCC AGCGAGTCAG 16800 
TGTGCAGCTT TCCAGAAAGG AAGTCATCAG CCAGTCAGCC GGCCCTGGCA 16 850 
GCCAGCACCC GGCAACCCTG CTGTCTTGTG ATAAAGAAAT GGTCTGCCTG 169O0 
ACAGGATGGT GTGGATTTTT CTTTTTTCTT TTTT'i'T'i'l T r TTGAGACAGG 16950 
GTCTGGCTCT GTCGCCCAGG CTGGAGTGCA ATGGCGGGAT CTTGGCTCAC 17000 
TGCAGCCTCT GCCTCCCAGG CTCAAGGCAT CCTCCCACCT CGGTCTCCCG 17050 
AGTAGCTGGG ACCACAGGCA CACACCACCA CGCCCAACTA AGTTTTCGTA 17100 
TTTTTAGTAG AGGCAGGGTT TTACTATGTT GTCCAGGCTA GTCTCAAACT 17150 
CCTGAGCTCA AGCTATCCAT CTGCCTTGGC CTCCCAAAGA GCTGGAATTA 17200 
CAAGCGTGAG CCACTGTGCC TGACCAGGGT GGATTTTTTC AAGTGCACAT 17250 
GTTGTGGTCC CAGAAGCTCT GATGGTACCA AATTCCAAGC GAAAAAAAGT 17300 
CAATGGTTCC CACCCATCCT ACCTCCCATG ATGGCAAGAG GAAATCACCA 17350 
CACTGCAGAT ACAGTCCATG TAAAACAAAT TGCTATGGAT TTTGAAAGTG 17400 
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AACCTT AAGA GAAC TGCACT ATGTTTTCTT CATTAGAGTT CTCTGGTAAT 17450 
TTCCAGCTTT TTTTTTTTTT TTTTTTAGAC AGTGTCTCGC TTTGTCGCCC 17500 
AGTGTCACCC AGGCTGGAGT 6CAGTGACGT GATCTCGGCT CACTGCAACC 17550 
TCCGCCTCGT GGGTTGAAGT GATTCTCCTG CCTCAGOCTC CTGAGTAGCT 17600 
GTATTTTAGT AGAGACGAGG TTTCACCATT TGGCCAGGCT GGTCTCGAAC 17650 
TCCTGACCTC AAGTGATTCG CCCATCTCAG CCTCCCAAAG TGCT6GGATT 17700 
ACAGGTGTGA GCCACTGCAC CCGGCCAGTA ATTTCAAGCT TCTGAGGAGC 17750 
CCTTTGAATT GTTAAATAAC TTGTAGCTAT GTCCAACATA TCCATGTTCA 17800 
GTGTATGTTC GATATTTCTT AGGAAACCTG CCCTTGGTTG TTTTCTTTGT 17850 
GGTAATTCAT GAGCCGGCAA ATTTGACATG TGTTACAGAA TATACCTTTT 17900 
CTCTGCTCTC CTACCTCATA ACCAGAACTT AATTATCCTG CTTTAGTCAC 17950 
ATAAATAGCT AACTAAATAA ATATATGAGA TTTCAGTCTG CTCACTGTGA 18000 
AAATAGACCT TCTAAATGAT CTCTTCCACT TGCAGATATT TGCAAATATG 18050 
GATCCATCCC TCCTGATGTG GAGGAGAAGT TACGGTT6GA ATGGCCCTAC 18100 
CAGGAGCAAT TGCTACTCCG AGAACACTAC CAGAAAAAGT TCAAGAACAG 18150 
CACCTACTCA AGTAAGAAAT GAAAGGCACC CTAGAGATGT TCCAGCCCCA 18200 
AAGATATTTG AATAGGTTGG ACTCGGGCAC CAATCTAGCA AGTCCTACGG 18250 
AAGTTGTATA AAGCTGAAAA TACTGAAGCA TTTCCCAAAT GGGAAATCCT 18300 
AAACTCAAAA CTTGCTTTTT GGTTTTTTTG TTTGTTTGTT TTTTCTTCAT 18350 
CTGACATTGC TTAGTAGTCA CAGAATGAAA GATAAATCAA TCATTCATGA 184 00 
TCTAACAATG ACCTTCAGTO CTCTAAAAAA CTACGGAGTC AAGGAAAACA 18450 
TGAATATATT CCTCATGTAA AATTAAAATA CAGACATATA AAGGGCAAAA 18500 
CATGAACATC ATTCATACCT TGAGGTCCGT CCCCCTCCCA GAAATAACCC 18550 
CCAGTATGCC TTGGTTTAGA GCATTAAGCA GGAGGGCCCT GAGTCACTCC 18600 
AGACAGTCTT GACCACCAAG CAGCATTCTC TTTTTGTTTC CTCTGTGGCT 18650 
TTTGCAAACA CAGGGCTAGC TCAGCTACCC ATTAGTATGT TTTCAGTCAC 18700 
TAAAACAGTC TTCCAGTCTT CAAATTAGGA TGACATTGTC ACATGGGGCT 18750 
TTAAAGCAAG TGAAACAAGG AACCCCCTTT TTTTTTTTTT TTGAGATGGA 18800 
ATCTCACTCT TGTCGCCCAG CCTGGAGTGC AATGGCGCAA TCTTG6CTCA 18850 
CTGCAACCTC CACCTCCCAG GTTCAAGAGA TTCTCCTGCC TTAGCCTCCT 18900 
ATTCATTATG AGGAATATTT GATTATTCAG TTCCTGTAGG GTAAAGATAT 18950 
TACCCCCGAT CATATTATTG ATTATTGAGT AGCTGAGATT ACAGGTGCCT 19000 
GCCACCACGA CCGGCTAATT TTTTGTATTT TTTAGTAGAG ACAGGGTTTC 19050 
ACCATGTTGG CCAGGCTCCA GGCTCGTCTC QAACTCCTGA CCTCAGGTGA 19100 
TCCACCCACC TCAGCCTCCC AAAGTTCTGG GATTACAGGC GTGAGCCACC 19150 
ACTCCTGGCC ACAATCCTTT TTTAACTATG AAATATATTT TTATCTGAAG 19200 
TTTGATGTTT ATACCCAACT GAGGGATGAT GTTCCCATAT CTCAGTTAAA 19250 
GAAATAACCT GCTCAGATAC TTCAAGCTCT TCTTTTGACT TTTGAAAATA 19300 
AATGATCTTG AAGTTACTAT ACTTTGTTTG GGTTAGTTAA CATTATTTAA 19350 
AGTATATTAT TTTAATTAAT TATCTTTGTA AGATTTTACT GTATACTACC 19400 
TGGAGTTCAA TGTATCAGAT GGATTTCAAA TTTATGTACA TTTTTTATGT 19450 
ATATGGTACA GAAAAAAATQ TGATCCATAA GAAATCAGAA AATAGCGCAT 15500 
ATGCTAATAG CTAATGTTGT CCTCTAAAAA ACTTATTTTT GCATTTTTAA 19550 
GAGGGGGATA TACTCTGACA CTTTAATAAG TGTAATTAAT TATTGACTGG 19600 
AATTTGGCAT GAGGCAGGGC CATTTCAGAT CCCATTAAAG GAATGACACA 19650 
TACCAGAGAA CCACAGAAGT AAGGCCACAT TTGTAATAAA TCATTATAGC 19700 
TCTGCTAGGA GAAGACCCAG. TTGTATTAGG TAATTAATGG ATTTGCTCTT 19750 
AAAACACATG TCCCGGAAGA TATAGGTGAG TCTTGGGGGG CCGCATTAAA 19800 
CATTATACCA ATGTATCTTA CATTTCTAAG AAAGTTTTAC TACTTTACAG 19850 
GATCTTTCTG TTACCAAAAT GGAAGGTTTC CAACTCCAGG ACTTGGCTTT 19900 
CATAGTTCCT ACACCAGGGG AAATGCCTTC CTTTGCTAAC TATGCAACCA 19950 
GGTTAGTTAG TGTAAGTCCA GCCACCCTGT TGGCAATGCT AAAAGGTACA 20000 
ACAAACACAG AATTTTATTT GCATTTGTAA ACATTTGATT TCTGGCTCGA 20050 
AATTTTCAGT TTTCATGGGC ACGTCATGGA AACAGAAATC TTCTGTGTTT 20100 
AGTTTGGGCA CCTACTCATT GTAGTGACAA ATATTTCAGA AGCCAATAGG 2 0150 
GGATTCCACA AATTGTTCTG AACCTGTGGC TGAGACTGGT AAIXSGCTGAG 20200 
TGACATGGGG ACATACCACA AAAGAAGAGG TAGCAAAAGG CTGCTGAGAT 20250 
AAGGACATGT TCATTGCTTA GCTAGTGGCC TGCACCCTTA AAACACATGT 20300 
CCCAGGCTGG GTGCTGTGGC TCACGCCTGT AATCCCAGCA CTTTGGGAGG 20350 
CTGAGGCGGG TGGATTACCT GAGGTCAGGA GTTCGAGACC AACCTGGCCA 20400 
ACATAGTGAA ACCTCATTTC TACTAAAAAT ACAAAAATTA GCCAGGCATG 20450 
GTGGCGGGCG CCTGTAGTCC CAGCTACTCA GGAGGCAGGC AGGAGAATTA 2 0500 
CTTGAATCTG GGAGGCAGAG GTTGTGGTGA GCCGAGATTG CGCCACCGCA 20550 
CGCTAGCCTG GGCGACAAAG TGAGACTCTG TCTCAAAAAA ACAAAAACAA 20600 
AAAACAAACA AACAAAAAAC AACAACAACA AAAAAACGGG TATCCCAGAA 20650 
GATACAGGTA AGTTTTCTAA CACAGGTCCT CTTGTATGGT GCGTTCCACT 20700 
TAAGTAGAAG ATGACAAAAA CATTTGTCAT GAGAATATAG ACTCACATTT 2 0750 
TAAACCTGTT TGAGCAGGAA AAGGAAGCAA TGTTACAGAT GTAATTCTGG 20800 
GTGTGACTGC AGAAA6GATG ACTCCCTTAT TAAAQTAGTC ATCCTGAGTG 20850 
AGCTAACTCT TTGTACTTCC TCTTCTCCTC CTGTTCCCCT CATCACCCCA 20900 
TTCTTCCGTT GCCTACACCC AGGCCCACAT TGGATGCTGA CATAGACTTA 20950 
CATGGTACAG TCCAAGGGAA AGATCTGCCA TTTTTTTCAA TGTGTCATCT 21000 
TGGTTATCTT CATTCCAAGG ATCTCTCCAC TCTTTATACA GTAAGAGATG 21050 
AGAGTCTGGA AAGG ATTGGG AATAAGATAA TGAATTGTAA GTTTTAAATT 21100 
GTTCTTCGTA TTTTGGGGAA GGAGTAGGCT AGGTGGTCCT TCTGTTTTTT 21150 
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TTTTGTTTTT TTTTTTAAAG TAGATGTGGC CAGACGTGGT GGCTCACGCC 21200 
TGTAATCCCA GCACTTTGAG AGGCTGAGGC AGGTGGATCA CTTGATGTCA 21250 
GGAGTTCAAG ACCAGCCTGG CCAACACAGT GAAACCCCGT CTTTACTAAA 21300 
AATACAAAAA CTAGCCGGGC TTGGTGGCGT CCACCTGTAG TCCCAGCTAC 21350 
TGCAGAGGTG GAGGCAGGAG AATCACTTGA ACCCGGGAGG TGGAGGTTGC 21400 
AGTGAGCCAA GATCATGCCA TTGTACTCCA GCCTGGGCGA CAGAACAATA 214 50 
CTCTGTCTCA AAAAAAAAGA GAAAAGAAAA GAAAAAAAGA ATGGATTTGA 21500 
ACTCAGTCGT CAATAGCCTC TATTCCAGGA GATGTTACAG TTGATTATGT 21550 
TATAGGGGGT GTATAATAGA ATTTCGAGCT ATGTAAATTC CAAGTGCATT 21600 
TGG AAGAAT G AAGAAATGGA GGAAGGGTAA AGTATGAGTG CAAGCATTCC 21650 
AGGTTTTTTG AAAATGCTAT AATCTTTGTT CAGGGCTAGT ACAAAGTGCT 21700 
ATTTAGCTGT AAGGGTTTTT TGTGATTTAC AGACAGTTTT CACATGTGTC 21750 
ATTTCAACCT TGGTTTTATG GCGAAGGCAT GTGATGGTGC TTGTCCCAGG 21800 
ACTTTAGATC CATATCTGAG GTTCCTGTCG GGCAAAGATA TTACCCCTGA 21850 
TCATATTATA GTCTATAAGT GGGAGAGTTG TGCCTGGAGC TCAAGTCTTA 21900 
TGATTTCTGA TCCAGGGCAC TTCCTACAAC ATGATTTTGC AATATAAAAG 21950 
CCTATAATGT GTGACTAAAG CAGGTCACTC ACCCCTTGTA ACAGACTCTA 22000 
GTAATQGTAC TGCCACCAAA CGGCTGCGTG ATATTGGGCA AAGACTTACC 22050 
TTATTTGAAT CTCAGTTTCC TCCTAGAAAA ATGAGGGTGG AGGTTAAGCA 22100 
TAGGCTGATG ATCCTAAAGC CTCCATACTG CCCTAAACTG TGGCTCTAAG 22150 
ATCCAGTAGA ATGCTGGGTC ACAGGACTCT AGGGAGCTTT TCAAACCCAA 22200 
ATGTCTGTCA TTCCTTGATG GTAGGCAGCA GTTTATGGAA GTGGGCGACA 22250 
CAGCAAATAT CAAAATACCT AAAGGAGCTT GCAAGAGTTG TTTCTGCCTA 22300 
GTGGTCTTTA TAGTTAATAT TAAATAGTTA ATTTTTTTTT TTTTTGAGAC 22350 
AGAGTCTTGC TCTGTTACCC AGGCTGCAGT GCAGTGGCAC AATCTCCGCT 22400 
CACTGCAACC TCCACCTCCC GGGTTTGAGC AATTCTGTCT CAGCCTCCCA 22450 
AGTAGCTGGG ACTACAGGTG CATGCCACTG CACCCAGCTA ATTTTTGTAT 22500 
TTTTAGTAGA GACGGGGTTT CACCATATTG GGCAGGCTGG TCTCGAACTC 22550 
TTGACCTCAG GTGATCCACC TGCCTCAGCC TCCCAAAGTG CTGGGATTAC 22600 
AGGCATGAGC CACTGCACCC AGCTTAAATA GCTAATATTT AATATTATTC 22650 
TATAGTTATT CAAGTAATTC AGGCCAAAGA CTTAGAAACA AAACAAAAAG 22700 
CCACTTTTAA GGAGAAAGGG TGTAAGTTTG CCAGATAGAT AGAGATCTTT 22750 
CTTTTTTAAC TACAAGAGTT CAGGAATGAA TTACTCTTTA ACAAACGACT 22800 
ATAGATATAC ATGAAAATTG GAAGGACTTA TTATGCATAT GATAATCAAT 22850 
TTAAAGACAA CACTTAAAAT TATATTGTTG CCACTCTCAA AAAGTGGTAA 22900 
TAGAACAGCT AATGGTTTAA AAAGCAGAGT ACAGAAGTTC CCAAACTTAT 22950 
GGCACCTTAA TATC GCAGAA AACTTTTTAA AGCATGCCTA GGCCACAAAA 23000 
AATACCTGTA TTTTGATTAT TAAATTGTAA GGTCTACACA ACCTAATAGT 23050 
AATAGGTCCA ATAGTAATGC TGTCCAATAG ATGTTGATGT TTTTTTCCTT 23100 
GCAAACTTAA AAGATCCTAC AGTGCCTCTG TAAATAGCAC TGCCTGGTTA 23150 
GAGTTGAATT TCAGATAAAT AATTTTTTTC ATGTTAATTA T ' I " I"i "l' C I'i " r i ' 23200 
CTTTACTTTT TTTTTTGTTT TTTTGTTTTT TTGTTTTTTT TTTTGAGACA 23250 
GGGTCTCATT CTGTTGCCCA GGCTGCTGTG CAATGGCATG ATCATGGCTC 23300 
ACTGCAGCCT TGACCTCCCT GGGCTCAGGT GATCCTCCCA CCTCAGCCTC 23350 
CCAAQ TAGCT AGCTGGGACT ACAGGTGCTT ACCATCATGC CCGGCTAATT 23400 
TTTGTGTTTT TTGTAGAGAT GTGGTTTTGC CATGTTGCCC AGGCTGGTCT 23450 
TGAACTCCTG GGCTCAAGTG ATCCGCCCGC CTCGGCCTCC CAAAGTGCTA 23500 
GGATGACAGG CATGAGCCAC TGCACCTGGC CCCTGGGCGA AGTATTTCTT 23550 
AATGGTTACA TAGGACATAC ACTAAACATT ATTTATTGTC TATATGAAGT 2360 0 
TCAAGTTTAA CTAGGTGCCC TGCACTTTTA GTTGCTAAAT CCTGTAGCTG 23650 
TACCCATGCA TTCACTGGTG CTCCCCAGCT TGCCTTGCAC AGAGTTTGGA 23700 
AACCATAGTC CTATAACTCT AGGCCAATTT TTTAATGTAA AATTTGATTC 23750 
ATTTTAAATT AATAAATAAT AACAGGAATT TTTTTAAAAA TTGTTTTAAA 23800 
TATAATTAAA ATTATCAAAA TATTTTTTAA CTGAACTTGT GACTAGAGAT 23 850 
ATTTAGATTA TGAAGAGTGG GGTTTATGCT AACTAATGAC AGTCTGGCTA 23 900 
TGCATGTGGA GCACTGAGCT ATAAATTGTG GCTTCCCCAA TTCTCCTCAT 23 950 
GTCACTTGAA CAAAACCTAA GTGTCAGACC AGAGCTTCTG GTATCTTCCA 24 000 
TGGGATTTCA TTCAACAGCT GGAGCAAATG AAGTCAQATT GATTTTTTTT 24 050 
AATTTGTCCA ATTTTGTTGT CTCAAAAACA TAATTATAAT CATTTATTAG 24100 
AACTAGAATT TCTTCAGTTT AACAACAGAA ATAGTTATTC ATTATGAAAA 24150 
<5CGAATCTGG AGGCCTTCAT TGTGGTGCCA ATCTAACCAT TAAATTGTGA 24200 
CGTTTTTCTT TTAGGAAGCT CTGTAGATGT GCTATACACT TTTGCAAACT 24250 
GCTCAGGACT GGACTTGATC TTTGGCCTAA ATGCGTTATT AAGAACAGCA 24 300 
GATTTGCAGT GGAACAGTTC TAATGCTCAG TTGCTCCTGG ACTACTGCTC 24 350 
TTCCAAGGGG TATAACATTT CTTGGGAACT AGGCAATGGT GAGTACCCCA 244 00 
GGGAACAATT CATT AATAAG GAGATTCCCC ACTAGCATTA TTTCTTTTCT 244 50 
TTTCTTTTTC TTTTCTTTTT TTTTTTTTTT GAGACA6AGT CTCGCACTGC 24500 
TGCCCAGGCT GGAGTGCAGT GGCGCCACCT CGGCTCACTT GAAGCTCTGC 24550 
CTCCCAAAAC GCCATTCTCC TGCCTCAGCC TCCCGAGTAG CTGGGACTAC 24600 
AGGCACCCGC CACCGCGCCC GGCTAATTTT TTT l " i ' X '' rri " i ' ■ IT ' rjfrri"! " ! " ! ' 24650 
TTTTTTTGCA TTTTTAGTAG AGACGGGGTT TCACCGTGTT AGCCAGGATG 24 700 
GTCTTGATCT CCTGACCTCG TGATCTGCCC TCCTCGGCCT CCCAAAGTGC 24750 
TG GGATTACA GGCGTGAGCC ACCAGGCCCG GCTAGCATTA TTTCTTATGA 24 800 
CACTTTTTTT TTTTTTTTGA GACGGAGTCT CGCTCTGTCG CCCAGGCTGG 24 850 
AGTGCAGTGG CGCCATCTCG GCTCACTGCA AGCTCCACCT CCCAGGTTCA 24 900 
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CGCCATTCTC CTGCCTCAGC CTCCCGAGTA GCTGGGACTA CACGCACCCG 24 950 
CCACCACGCC CGGCTAATTT TTTTGTATTT TTAGTAGAGA CGGGGTTTCA 25OO0 
CCGTGTTAGC CAGGATGGTC TCTATATCCT GACCCCATGA TCTGCCCGCC 25050 
TCGGCCTCCC AAAGTGGTGG GATTACAGGC GTGAGCCACT GCGCCCGGCC 25100 
AACACTCTTT TTATTATTAG CAAATATACT TCTGCCTGGG CACATTCTTG 25150 
CAAGTGCTCA ACAATGCAAC TTTTGGAAGT GCATGTGGCA GAAACTCCTG 25200 
CTGTATTTAT TCCAGAACCT ATTATTGCTA ATCCCAGTTT ATGTTACATT 25250 
TGAAGTGAGA ACCAGTTGGA GCCAGCAACG TTCCCAGCTC CAAAGTTCCC 253 00 
TTGAGATTTT CAGAATCACT TAACCCTATT ATGCTTGGCA ACCTGGACTC 253 50 
AGCAAAACTG GGAAGTCAGC AGTTTGTTTT ATTCATCCCT TCCTTTCTCA 25400 
GTTTCTCAAA TGTGTCAGTT AATCTCAGTA ACCCCATTGC AACCTTCATT 2 54 SO 
ACCTGCCCAA GCGGTCTAGA ACTTGCCAGT ATAGAATCCT ACGTGGGTCA 25500 
AGCTCCTGAC TGTCTCCTTC TTCACTCTTT TTTTGCAAAG AACTTGTAAA 25550 
TTTTAACTAT AAGTATTCAT GATTCGCCAC ATTTAITCAA AACATAGAGT 25600 
GCTTTTTCCA CATATCAGCC AATQGAAATA AGGATTAAAT GGGAAATGAA 25650 
ATGTAGTAAT AGGATAAGCA CAAGTCTTCT TCCTGCTCAA ACTTTTTTTT 25700 
TTTTTTTTTT CAGACAAGAT CTTGCTCTGT TACCCAGGCT GGAGTGCAGT 25750 
GGCGTGTTCA TAGCTCAATG TAACCTCCAA CTCCTGGGCT CATGCAATCT 25800 
CTCACAC CTC AGCCCCCTGA TTAGCTAGGA CTACACTATG CCTAGCCAAT 25850 
TTTTTTTCTT TTGTCTGGTT GTGTTGCCCA GGCTGTCTCG ATCTCCTGGC 25900 
CTCAAGTAAT CCTCCTGCCT CGGCCTTCTA AAGTGCTGGG ATTATAGGCA 25950 
TGAGCCACTG TGCCCGGTCT CAAACCTTTT TTTCCAAAGT AAATGAAGTT 26000 
ATTAGATATG GAATATAGTC TAGTTCCCAG ATATCCATAT CCATTGGTTT 26050 
ATTACCCTCA TTATTAACTT CAAATTGTTT AATAGACCCT CATATCTCAG 26100 
TTATACAGTT AA AATT TTTG TTTTGTTTTT CTGGAGTATC TTATTTATAA 26150 
CTATGAGTTT TACTTTACTT ATTTATTTTA TTTTTTGAGA CAGACGCTTG 26200 
CTCTGTCACT CAGGCTGGAG TGCGGTTGCG TGATCATGGC TCACTATGGC 26250 
CTCGACCTTC TGGGCTCAAG TGATCCTCTC CCTCAGCCTC CCAAGCTGAG 26300 
ACTACAGGCA TGCACCACCA CATCTAGCTA ATTTTTTTTT TTCCCCATGG 26350 
AACAAGGCTT TACTATGTTA CCCAGAGTGG TCTCAAACTC CTGGCCTCAG 26400 
GGGATCCTCC TGTCTCAGCC TACCAAAATG CrGGGATTAC AGGCATGAGC 26450 
CATAGCGCCA GACCTGGTTT TACTTTTCTT GACTTTGAAT TACAAGTTTT 26500 
TGTAATTTGG AAAATGTTTT GTTGCTTTTA AATACTGCTG TAT6TTTGCT 26550 
TTTAAATACA ACATTTCTCG ATATATATTT TGAGAATTGC TGTCTTTCAG 26600 
AACCTAACAG TTTCCTTAAG AAGGCTGATA TTTTCATCAA TGGGTCGCAG 26650 
TTAGGAGAAG ATTTTATTCA ATTGCATAAA CTTCTAAGAA AGTCCACCTT 26700 
CAAAAATGCA AAACTCTATG GTCCTGATGT TGGTCAGCCT CGAAGAAAGA 26750 
CGGCTAAGAT GCTGAAGAGG TAGGAACTAG AGGATGCAGA ATCACTTTAC 26800 
TTTTCTTCTT TTTCCTTTTG AGACAGAGTC TCACTCTGTC AGCCAGACTG 26850 
GAGTGCAGTG GTACAATCAT GGCTCACTGC AACTTCXSACC TCCCAGGCTC 26900 
AAGCAATCCT CCCATCTCAG TCCCACAAAT AGCTGGGACT ACAGGTGCAC 26950 
ATCACCACAC CTGGCTACTT TAAAAAAATT TTTTTGTAGA GATGGGGTCT 27000 
CCCTGTGTTG CCCAGGCTGG TCTCTTGAAT TCCTGTGCTC AAGCCATCCT 27050 
TCCACCTCAG CCTCCCAGAG TGCCAGGATT ACAGGCATGA GCCACCACAC 27100 
CCAGCCACCA CTTTTCTTAA AAAAAAAAAA AGATTCTCTC TGGTAGACAA 27150 
TCCTCAATAG TCCACATGTT ATTAAACAAT CTGCTGCCTG AATACATGAT 27200 
TTACCAAAAA AAGGAAATTT TGACGGGTTC AGAATATCAA GGGATCTGAG 27250 
GCAAATGTCA CCTATGATAA AATTTGCTAT CAAAATTAGG AAGTTTGTGT 27300 
TTACCTGATC CTAAAGCAGT AACCAGCCCA TTTCTAQGGA ATAAAACTCT 27350 
CATGCGTATA TTGTGCATAT ATATGTATTA TATGACTGAG TGATAATAAA 27400 
ATTTTTTTTC TAGCTTCCTG AAGGCTGGTG GAGAAGTGAT TGATTCAGTT 27450 
ACATGGCATC AGTAAGTATG TCTCCTATTC TTAATACTAG GAAAGTAAGG 27500 
CTAGCTTTAT TTATTACCTA GTATTCAAAA AGTTAGTTCA TTTAACTGCC 27550 
AATTGACTGC AGTTCAAATA AGAAACAAAT AGTGTCTCAA GTAGCACTGT 27600 
ACTCCAATTT TAATATTAAT AAAAAAAATT TTAAGTTATT TTAAATAATG 27650 
TAGTGGTTTC TATAAAGATC ACTTTATACA GAAGAACAGT GCCAATTAAC 27700 
CCATGGAACA TATAAGTAGC TAAAACCAAT TGCTTGCCAA AGAACCAGTA 27750 
ACCCA6GAGT ACATGTCCTT GCCACTGTGT TTTTTCAAGA CAGAGTAACT 27800 
GATTTCTAGT TACTTGCATA GAATGGACTC CTCCTCATAA CTCCCTTCCA 27850 
TCTTGGTCTT TCCCTAGTAG AACTTCTACC TTTTTTTAGT AACAGGTGAG 27900 
TGGGAGAGGT AAGAAGGAGA ATAAGGTCAG CAATTAACCT AAAAGCAGAA 27950 
AGTAAAATTT GTTATTTTTT TTCTGAATAT TTTCTGTGTA ATTTAGCTAC 28000 
TATTTGAATG GACGGACTGC TACCAGGGAA GATTTTCTAA ACCCTGATGT 28050 
ATTGGACATT TTTAnTCAT CTGTGCAAAA AGTTTTCCAG GTAATAGTCT 28100 
TTTTAAACTT TTTAATGTAA AACCAGAATC CTTATTTTAT AGTCTAGCTA 28150 
GTTCTAAATT CTATAGGTAT GTATATTTAC ATGTTTTTCT AATTTTAGAG 28200 
AACAAGCACT ATGACTTATC CACTGTTAGT TTTCCCCTTA GCATTGGGTC 28250 
TTACCCCATQ TACGTGATTA GAAATTTGAA ATATTTCCAA TAGCCTTTAG 28300 
TAGAATTAAC TCACATAGAT GATAAGAATG GGTTGGTTCA CTTCATGTTC 283 50 
CTTCCACAGC CTACTATTTC AATAAAAGAA AGTTTCCCAA GACCTAAATG 284 00 
ACTMGAACA TATTTTATAA CTATATAGGA GGGGTGGGTC TAGGAATACA 28450 
AAGTTTTGAA TGCTGTTAAT CTTCAACACC ACAGTTGAAA CCACAGGTCA 28500 
GCTTTTTTGC AATTACCATG GATACTTTTC TGTTCTATAG GTGGTTGAGA 28550 
GCACCAGGCC TG6CAAGAAG GTCTGGTTAG GAGAAACAAG CTCTGCATAT 28600 
GGAGGCGGAG CGCCCTTGCT ATCCGACACC nTGCAGCTG GCTTTATGTG 28650 
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"'"^CCTT AGGGGTCAGA GTGCftGCTCT TCTCCATCCT 28700 
^^^^3^ TGAAATAGCT CCCC31GCCAA AAAGCAGATC AAAGACXX3Tr 28750 
rSflfS^ AGCCCCAAAA TTCATGCCAG ATTTTGCAAG AAAATOATTT 28800 
^S^^"^ TTTAACAAGT GTTCCAAATT AATCACTATA 28850 
^^ISi^^ TTTTGGCCTT TAATTATGGC CCATAAATAT 28900 

^^^SI^J CCTTACTCTA AAGAAGTACA CTGTAAAAGA ATGCATATAG 28950 
n^^^^^ TAGTTCCCTG TAATCCCAAT ACTTTGGeAG GCCAAGG'TOG 29000 
^^^^S TGAGCCCAGG AGTITGAGGC TGCAGTCAGT TATGATOGTC 29050 
^S^^^ CTAOACTGGG CAACAGAGTG AGACTGTCTT TTTTTTTCCC 29100 
CAGACTGGAG GGCAGTGGCA CGATCTCACC TCACTCCAAC 29150 
SS^^^ CGGATTQAAG CGATTCTCCT GCCTCAGCGT CCTOAGTAGC 29200 
TGGGACTACA GGAGTATCAC CGCACTGGGC TAATTTTTGT ATTTTTAOTA 29250 
GAGACGGGGT TTTGACATGT TGCCCAGGCT GGTCTGAAAC CCATCAGCTC 29300 
AAGTGATCTO CCTACCTCAG CCTTCCAAAA TGCTGGGATT ACGGACATCA 29350 
?5ISa™ SSf^'^"^ CCTGTCTCTT AaSSS^ ^S^JS 2I40O 
TTAGAGCATA TTACAGCTTT GTCTCTCAGG AGGATACTTA GTGTAK3TAG 29450 
Sr™^^ TAGATTCCCA AGAAGTTTAG AGCCTAAAGT ATGAG^W 2950^ 
rr«r»^ ^T^JSST** ATTTAAAGAT TTGTTAAATC ATOTCATTCT 29S50 
^f^^S^^ AAACTTGATT GCTTTAAAAT ACTGGTTTAO TTACATTTAG 29600 
TAACTCTMT AGTGCTTTTA ATCTATACTG CTATATCCTC ACATTCAGAT 29650 
TTTTTTTCTT TTCTCTTCCA TCTTCATTCT TTTTTCTCTC ATCC^ATTC 29700 
S^J'^'"^ ACAAATCCTT TATGCCCAT« GAA^^S^ 29750 
TeTTTTGCCA TTAACTAAAG ATCTGGGGTG 29800 
TCGGGGAGAA GGGGGATAGA GAAGGAGAAG TGGGAAGAOG TCTCCATAAT 29850 
AGCTTAGGTG CAATTCTGCT TATTTTACAT nTACCCC^^ ^^^SS^ 29900 
CTTTTTCTTC AGCCCTCACA CATTGTTTGT GCAGGGACCT CATAGGACCA 29950 
GGAATTGTCT ATAGAGGTQG GAATrrGTCT CACCCTGAAA GGGATACCTC 30000 
^sSSfiJ ^TAGTCITCT AGGATTTCTT ATCATATXMA iiS^S^S^ 30050 
SS?™*" CTGCTGCTGC TGCTGCTGCT GCATGCAGTT GCCATTTCAT 30100 
TTAAATCACT TATTTATAAT TGATCACACT TTTCTGGCTT CCTCTTAMT 30150 
CCTCCCTCAA AGATCAATAA ACCAGAACCA GGCATCGTCG CATCCACTTG 30200 
^^f^^'^^ AGGTTCACCT TCCCTCCTGT CTAOAtSaG lollo 
CCAATTATCA AGACAOGGGA ATTGCAAAGG AGAAAGAGTA ATTTATGCAG 303O0 
AOCCAGCTGT GCAGGAGACC AGAGTTrTAT TATTACTCAA AT^^J^c 30350 
CCGAACATTC GAGGATCAGA GCTTTTAAGG ATAATTTCGC CGGTAgSgC 3M00 
I^iS**^ GAGAGTGCTG GTTGGTCAGG TTGGAGATCG AATCAC^ sS^SO 
^rSS^ ^^'^^ ^'^^^'^ ST^^AT GGGA?5S^ 3^500 
AACTGGTTGG GCCAGATTAC CGGTCTGGGT GGTCTCAAAT GATCCACCCA 305^0 

'^^"^ 30600 
^J^t^ ^'^'^^^ ATTTGGGGAG GTTCAGACTC TTGGAGCCAG 30650 
AGGCTGCATT ATCCCTAAAC CGTAATCTCT AATGTTGTAG CTAATTTGTT 30700 
^ScSS AGGTAGACrr GTCCCCAGGC AAGaI^ 30750 
AAAGGGCTAT TATCATTTTT GTTTCAOAOT CAAACCATBA ACTGAATTTC 30800 
l^^^r r^"'^*^^^ TACACCCAGG AATCAAGAAG G^^^ 30850 
AGGTTAGAAG CAAGATGGAG TCAATOAGGT CTGATCTCTT TCACTGTCAT 30 90O 
AATTTCCrCA GTTATAAITT TTGCAAAGGC GGTTTCAGTC cSg^^^ 30950 
GGGAGGCTGA GACAGGAGGA TTAATGGAGC CCAGGAG^TT G^SSS^ 3^000 
^AGCTATGA TCAOGCCACT GCACTCCAGC CTGGGT«IS G^S^ 31M0 
fl^SI^i ATAAATAAAT AAGTAAATAA ATAAATACAT AAATAAAATC 31100 
AAGATGGTGT GCAATTAGAA TTGAGCGATT TTGTTTCCAA ACCTCAAGAA 311SO 
AG^TCT TGCTCTGTCC CAGGT6GCTG aii^ 3^2^° 
CCGAATGGGA ATAGAAGTGG TCATGAGGCA AGTATTCTTT GGAGCA^ a^SO 
J^^^IT ^^^^■"SAA AACITCGATC CTTTACCTGT AAGTGACCAT 31300 
TAn-TTCCTA ATTCTAGTGG AGTAGATTAA AGTCAACTCA OGACCTCTCG 31350 
IS™!^ S:''"^"^ 3^400 
AOATGATGAA TTAGAAGGAG CCTTAGATAG CATCCAATCT AACATTTTTT 31450 
TCTGTGTTTG AAQAGAAGAA ATCAAGAGCT AGGAATAACT TTTTAAAGGT 3^50^ 
AAGCCATTTG CAGTATAGTG TGGATTTTGT TTAAAAGGGG ATAATTTCA^ 3^550 
ilSI^^^f 15^"''™'=* AGACAAAATA AGTTGGATTT TCAAAMTT? llllo 
. ATCAAAGTTA TAATTGCCTA CAGTACOCAA AGCTTCAAAA 31650 

CATTTTTTAT OTTATQAAAT TGTAATTTAT TTAACCTTAA AATCAGCCAG 31700 
TACCATGTGT TTGCTTAAAA ATCTCATGCT AAGAATTTAC TATG^^^WA 3^750 
ATAATCrrCA AGATATTTAT GAATAAAGTC TTATTTCTM TcSj^C lUlo 
CAGGAAATGT TTC^^^ 3^850 
GGAAOATCTG TATGTCTAAA TATATGTCAG GGATAATACA GATCTAGCCC 31900 
GACCTTGATT TTTATAGTCT AAAATOTCAT T7X;CA^ATAT llllo 
CTATTTTCTA AGAATAAITC CTAAAA6AAT TATTTCAATC TTCTAGoSa 32000 
GCTAAGAAAT TTTGCAAAGA GCGTACGTCA AAATATAAGC TAGGCTTTTG 3205^ 
S^^SS^ CAACAAAATT GCTTTTTATC TATAG1X3ATC 32100 

CAAOCTTGTG GAACATATTA GTCATCTTTT TTTAGAAAAT TCTTAGAAAA ttltr, 
^'^'^ AAAAATGGAA TTTATCTTTC CCCAA^I ^^^S^ 3L0O 
l^J^S^" ^"^ft gCAT AGTAATTTCA CCAGACAAAC ATTCAAAATC 32250 

TAOSAACTCT TTGTATATGC ACTAAATATG CITCTCCTTC 32350 
AAGGTTCTCA GTCAGCTAGA AAAATGTGCA AGAGTAAATG GTACCCTTCT 32400 
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CACTTGTAGA TCCAAGAGAA TTAGACTTAA ACTCACTCTA CATGTCTGTG 324 50 
ACTTTATTTT ATTT GCATGA CAGTCCTGTG AGGTGGCAAG GCAGGTATCT 32500 
TGGATCCATT TTTTAGATAA GGAAGTTCAA ATTGAGAAGA GGTTGCATGA 32550 
TTTACAGGAA GCCATACTGT AGTCCTATGT TACTCTTAAA AATCCCATTC 32600 
AAATCCTGCT TCTGAGGCCT GCATACTTTC TACCCTACCA GTCATTGACC 32650 
CATGCTTATG TCTCCTTTGA AAACATTGAT TCCACTCTTG TCTCCAGTGA 32700 
AAAAGTGGAA TTTAAGCAGA GAAACAAAAG CCATTTGTCT TGTTAAGTCT 32750 
ACTTTCCCTC TACTTTCAAG AAGGAAAGTT GGGGTATGTG TTGAATGGTG 32800 
ATTTATTTAT TTATTTATTA TTTTAAAAAT TGATACAAGG TCTTACTGTA 32850 
TTGTGCA6GC TGGTCTCAAA CTCCTGGGCT CAAGTGATCA TCCCACCTCA 32900 
GCCTCCCAGT GTTGGGATTA CAGCATGAAC CATTGTGCCC ACCACCGATC 32950 
CGCAGTTTTT TAAGAAAAAC TTTTACTATA GAAAATTTTA ATCATATACA 33000 
AAATACAGAG GAAAGTATAT GAACCCACTT TAGGAGACTA GAATAT6CCA 33050 
CCCCAAAATA TGCCACTTTG GCATAAGGAT TATTTC6AGC TAAAGGCAAC 33100 
TGGGAAGAAA CACATAGAAG AAAAGTTCTC TGTCCTTCTC CATTTGCCTA 33 ISO 
AAAGCAGGAC ATGAATCTTA AAAGTCCCCC TCCTTCCCTT TCTAOCAGGA 33200 
AAAACAAGAG TTAATCACTG AAGATAACTT CAGACCCTTA TCAGTGTAGA 332 50 
GATGGCACTA GAAGAATCTA TATTACATAC TCATTTATTT TCCTTCCCAC 33300 
AACTTGCCAC CCCAGAGACT AAAAATCCTT TTCCTTTGTC ATGTCTCTTG 33350 
TCCAAAAATT TGCTCTATAA GCTGGAGTTC TAAGCCACCT CTTTGAGAAT 33400 
TACTTGTTCC CTGGTATTTT CTGTTAACAT ACATGTATTA ATATACATGT 33450 
TAACAAGCTT CTGTTTGTTT TTCTCCTGTT TTCTGTCTTG TTACAGAGGT 33500 
CCATCCCAAC TAAGAACTAA AGAGTAGGAG GAAAATATAA TTTCCTCCTG 33550 
CATACTTTGA TCTTGTTTAA TCCGTAACCC TTCCCACTTT TCACCTCCTA 33600 
CCTATTAGAT TACTTTGAAG CAAATTTCAG ATATATTACT TTATCTATAA 33650 
ATATTTCAGT ATGTGCTAGG TGTGGTGGCT CACACCTGTA ATCCCAACAC 33700 
TTTGGGAAGC TGAGGCAGGA GGATCACTTG AGCCCAGGAG TTCAAGACCA 33750 
GCTACGGCAA CAAAAAATCA AAAACTTATC TGGGCATGGT GGCACATCCC 33800 
TGTGGTCCCA GCTACATGAG AGGCTGAGGC AGGAGGATCG CTTTAGCCCA 33 850 
GGAGGTTGAG GCTGCAGTAA GCTGCATTCA CACCACTGCA CTCCAGCCTG 33900 
GGTGACAGAG TAAGACCATG TCTCAAAAAA ATACATATTT TAGTATGTAT 33950 
CCTTTTTGTA AAAACACAAT ACTTTTATCA TACTTTAAAT AATAACAATA 34000 
ATTCCTTAGT ATCACCAAAT ATTTTGTCAG TGTCTCACAT TTTCCTTATT 34050 
GTCTAAAATA TTGTTGATAG TTATTCAAAT CAGAATCCAA ACAAGGTCCA 34100 
TATATTACAT TTGGTTQACA AGTCTCTTAA GTTTGrrCAT CTTTAAGTTC 34150 
TTCCTCCCTC TCTTTCATCT CTTGTAATTT ATTAATGTGA AAAAACAGGT 34200 
AATTTGTTCT ATAGTATTTC CTACATTATA GAGTTTGCTA CATTTATTCC 34250 
CTATGATATC ATTTAGCATG TTCCTCTGTC CCCTGTGTTT CCTGTAAACT 34300 
GGTAGTTATA CCTAGAAGCT TGAGTTTATT CAGGTTTTTA ATTGTATTTT 34350 
TTTTGCAA6A ATTCTTTATT ATCTGCTTCT GGAAGCACAG AATCTCTGGT 34400 
TG TGTC TGGT TTTGATCTTG ACAGCTACTG ATGACCATTG CCTAATCCAT 34450 
TACTTTATTG GGGTGGGGOG AATAAGGTTT TAAAATAAAT TTTTTTTAAA 34S00 
GATTTTTTTA ACTGTTATTT TGAGACAGTG TCTCATTTCG TTTCCCAGGC 34S50 
TGGAGTGCAG TGGCACAATC ACGGCTCACT GCAGCCTTGA CCTCCTGGGA 34600 
TCAGGTGATC TTCTCACCTC AGCCTCCTGG GTACCTGGAA CTACAGGTGC 34 650 
ACACCACCAC ACCTGGCTAA TTTTTTGTAT TTTGTGTACA GAAGGGGTTT 34700 
CATCATGTTT CCCAGACTGG TCTTGAACTC CTGGGTTCAA GTGATCTACC 34 750 
CACTTCAGCT TCCCAAAATC CTGGGATTAC ACTTTGGCCA CCGTGCCTGG 34 BOO 
CCTAAATGAA ATTATTTGTC TCTAAACAGA CAGAAGTTTT ACTTTAAAAA 34 850 
TTTGTCTTTG TGTGTACATG TGTTTGTGTA TGTGTGTGTG TCTAAAAGTT 34 900 
TGGCTTTGAG CTTTGCTTTG AATTCTTGGA TGAACAATAA CCAAGAATAC 34 950 
TTAAACTCTG ATCATTCTTG ACAGATATCC CCTACAGGCT ATGGCCTTTT 35000 
GAATTGTGTC CTCCAGTGAT AAAAAGCAGC AAGCACGATA CTGCTCTCAG 35050 
ATTCATGGTG GTCACATGTG AGGTGAAAAA AAAAAAAAAG ATGAATCCTA 35100 
TTTAAATGCC CCCAGGATAA CAGTQATACT CTTTGTAGGA TAACTATTTG 35150 
CTTGCCACTG GTTTCATTAA ATAAGGACAT AAGTAAAGAT CTATTTTTGT 35200 
CTCTTTCTCC CCAACCACCA CAACTAGGAT TATTGGCTAT CTCTTCTGTT 35250 
CAAGAAATTG GTGGGCACCA AGGTGTTAAT GGCAAGCGTG CAAGGTTCAA 35300 
AGAGAAGGAA GCTTCGAGTA TACCTTCATT GCACAAACAC TGACAAGTAA 35350 
GT ATGA AACA CACCCTTTAC CAATCATCAA GTTTTAGTGG GTAAGCCTGT 35400 
AACTTTACTC AAACACCCTG TTGCATGTGT CTATACATTG CATAAGTATA 35450 
GGCAGTTGCA ATTTAGTAAA GTTTTATACA ACGATTTTAT TTTATTTTAT 35500 
TTTTAGAAGA AAAATGCTAC TTTT6TTGTT GTTGTTTTTT GAGACGGGGC 35550 
CTCGCTCGTC ACCCAGGCTG GAGT6CAGTG GTGCAATCTC AGCTCACTGC 3 5600 
AACCTCCGCC TCCCGGGTTC AAGTGATTCT TGAAGAGGAG AACAATAATA 3565 0 
ACAACAATAT TATTTTCAAA AGTTGTGACC GCAGTTTCTG GAGTTGAGAA 3570 0 
GACATCGAGA TTTTTGTAGC CTCATACTCT TGCTTTAGGT AGCAAAAAAT 35750 
GTTCCTAAAT CTCAGGAATA TTCTCTAGAT AGGTTTCAAT CTATCATTCC 35800 
TGATAAGATG ATGCTGAAAT ACTAATTCTA GCCAAAAAAG ACCAGCTACC 35850 
ATTTCCGATT GTTGGGGACT GGGAACTCTG GATAGTGAGG ACCCCAGTAG 35900 
GAAGTAGCGA GGGGAATGGT TTGAATGGAT AAATTCATAA AAAATGTCAG 35950 
TAGATTTAAT TTTCTTATAC ATTTCAGTCT TTTTATAAGG CTAGGAAAAG 36000 
CCCCTGTTTT TATGGTTTAT AATTTGAATT CACATGAACC CACAAAATTT 36050 
GCCTTTTACC TTCCTATGTC TGAAAATGGA TAGTCTGGCT GGCCTCTTAA 36100 
CAACCCAGCr GGCAGAGCTG TGAGGATCTC AGTGTGCTCT AGCCCAGACA 36150 
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TTGGTAGCAT GAACGGCAAC ATTTTTAATT GTGITTTCAA AATAGGAGCA 36200 
CACTAGCX5GT CTAAAACGAT CATAAAAGAA GGATACTAAG AGGGCCCACT 36250 
GTCATTATGG ATCCTAATAC TTAGGATGCA TTATGGATTG TCATTATGGA 36300 
TACTAATACT TAGGATCACA TTTGTAATTG AGTTTTTAAT TGCTTAAATT 3 6350 
AGATACATAT TTCTATTAAG TTAACCTCTT TGCTTTTAGT CCAAGGTATA 36400 
AAGAAGGAGA TTTAACTCTG TATGCCATAA ACCTCCATAA TGTCACCAAG 36450 
TACTTGCGGT TACCCTATCC TTTTTCTAAC AAGCAAGTGG ATAAATACCT 36500 
TCTAAGACCT TTGGGACCTC ATGGATTACT TTCCAAGTAA GTAATTTTCC 36550 

TTGTTCATTC CAAACTTTCA ATAAATTTAT TGGTGTTTAT CAGAATAGAG 36 SCO ^ 

AGTTTGGACA GGGAGCAAAA GACAAAGTCA ACTATATCAA GTTCTAATAA 36650 

TTCTTAATAT TCAGGAAATT TATGTATGAA TACTTACTAA TATGAGTATA 36700 

ACTCATCCTA AGAGTCTAAA GCAAAAGGAT GTGAACACAA ACTAGCAGTT 36750 

ATCTTAGAGA ATAAGTTTGC ATTTCAAAAT AACTTGACAT ATCAAGATCC 36800 

ACTCAACGCA TTTAAATTAT TTACTCTAAA AAGACATAAT TCTTGGTAAC 36850 

ACATTCACTA AAGCAAAATA TACCTTTATA TAATTGCTAT CAAAGGTATG 36900 

TGGGTTGGTA TAAAATATCA TACCATGTGA GATCAGTGTG ATTCCTTTAC 36950 

AGCATTAATT TTTATTGGTT AGAGTAAGAA AAAGAATAGC TAGAGTATAT 37000 

TTCTTAAGTA GATTCTCATA CACTTTGGTT TCAAAAACCA ATTATTGACT 37050 

ACATCTTATA AAAGCCTGTA TTCAATGGAG TGCXaUUUWV TGACTATGAG 37100 
TCTTAAAGAG TTAGGCATAT AAATATTTTA AGGTTTCTGT TCAATGTATG 37150 

TTGGAAGGAG TTCCTTTCTC ATGACTATTC TCATATTGGA GCATAAAAAG 37200 

AGTTTACAGG CTTGGCGCAG TGGCTCATGC CTGTAATCCC AATACTTTGG 37250 

GAAGCTGAAG CAGGCAGATC ACTTCAGCCC AGGAGTTTQA GACCAGCCTG 37300 

GGCAATATGG CAAAACTCTC TCTACAAAAT ATACCAAAAT TAGCCAGGCG 37350 

TGGTGGTGCA TGCCTQTAGT CCCAGCTACT TGGGAAGCTG AGGTGGGAGG 374 00 

ATTGCTTGAG CCCAGGGGGG TCATGGCTGC AGTGAGCTGT GATGGTGCCT 37450 ' 

CTGTCACCCA GCCTGGGTGA CAGAGTGAGA CCCTGTCTCA AAAAAATAAA 37500 

TAAATAA7UUV TTAAGAGTTT ACAAAATTCT CACCATCTCC TCCCATCTTT 37550 

GCAAATGCCA CATAAGTGAT GTGTTCCAGG ACTATTAGCC TCGGAACCTO 37600 

AGGCAGTACA GTAAGCACGC TTTCTCCAAA GTCCTGTCCC CCACAGACAA 37650 

ACATTATTTA CACTGGGTAC TGCTCTTTTA TTTTTTCCCC TCTATGCTTT 37700 

ATTTTACTAT AACTATAATC ATATAACATG TAATAGGAAA AAGGCAGGGT 3 7750 

CGGGGGAGAG ATCCAGAAGT CTTCCCAAGA GCCTTTCCAA CATAGCCTCT 37800 

GTAGACATTT rTTCTTTCTT CTTTTTTTTT TTTTTTTTTT TTCT6AQACA 37850 

GAGTCTCACT CTGTTGTCCA GGCTAGAGTG CAGTGGCGTG ATCTAGGCTC 37900 

ACTGCAACCT CCGCCTCCTG GGTTCAAGCA ATTCTCCCAC CTCAGCCTCC 37950 

CTAGTAGCTG GGATTAGAGG CATGCATCAC CACGCCTGGC TAATTTTTGT 38000 

ATTTTTAGTA GAGATGAGGT TTCACCATGT GGGCCAGGCT GGTCTTGAAC 38050 

TCCTGACCTC AAGTGATCCA CCTGCCTTAG CCTCCCAAAG TGCTAGGATT 38100 

ACAOGAGTGA GCCACCGTGC CCTGCCCCTA TTACATTCTG ATCACACATT 38150 

TCATGTTTTA TAATTGGAAA ACTGGTGAAA TTATAGACAA TGTTTTGTTC 38200 

CCC TAAAT TC TCTTTGATGA GTATATATTA CTTACACTCT TCTGTCTTTA 38250 

AAATTTTGCA AAATAGTATC CTAGATAAGT TTATGAGTGC ACAGTCTGTA 38300 

CGCTTACTCA TATTAATGAC CTCGGAGAGT TAAACAACAG TCACCTTTAA 38350 

AAATTATTAC TATCATTATC ATTATTTTTG AGGCGGGGGT CTCATTCTGT 38400 

CTCCCAGGCT GGAGAGTAGT GGTGCGGTCA CAGCTCACTG CAGCCACCGC 38450 

TACCTGGGCT CAAGTGATCC TTCCTCCTCA GCCTTCTGAG TAGCTGAGAC 38500 

CACAGGCTTA TGCTACCACA CCTGGCTAAT TTTTTAACTT TTTGTAGAGA 38550 

CGATGTCTCA TTATGTTGCC CAGGCTGGTC TCAAACTCCT AAGCTCAAGT 38600 

GATCTTCCTC A6CCTCCCAA AGTGCTGGGA TTACAGGCAT GAAAAACTGC 38650 
ACCCAGCCCT AAAAATTATT AGGGTCCTGC ATAGTAAGAC TTTAATAAAT 38700 
ATTTAAATGA ACATCTGGTT TTTTTAAAAA AAAAATAGAG ACAAGGTCTC 38750 
ACTATATTGC CCAAGCTGGT CTCGAACTCC TGGACTCACG CAATCCTGCT 38800 
GCCTTAGCCG CCCAAAGTGC TGGGATTACA GGCATGACCC ACCTCATCTG 3 6850 
GGCTG AGTG A ACATATTTTT AACATAAAGG CCGTATTTTA TATTTATCTC 38900 
ATACATTTTG CCCAGCATCC CCATTTCCGC CGAATCTGTT GCTTGCTAAT 38950 
TCCTTCCAGC TTCATTTCAT CTGAAATTTG ACAAACATCT TCTATTTCTT 39000 
TGTCGTCATX3 TTATTGACTT CAGAATATAA AATAAAACAC TATACCCAAA 39050 
TTAAACCCCA CCCTCATTGC CCAGCCTGAT GTGAAAATAA TCAGCATACA 39100 
TTAAGCTTAC CCTTGATATA TGTGTAGCAT CTTTTAGATA AATATACAGC 39150 
TGATTAAGCA ATATAGCCTG ATGGTATAAT ATCTTGCCCA TGTACCTCAT 39200 
CTTATCTCCA GCAGGATTAA TTCACAGTGA TCAGATTTAC CTTTAAACTT 39250 
TGTAGCAAAA TATCCTCTCC AAAAGCATAT CTAAAACTTT TGTGTGTACT 3 9300 
CTTGCAAGTT TCTTAATTTC ATGCAGAACA GGCTCTTACC ACTGTTAGCT 39350 
GGAGATATTT TCAAGACCTA TTTTTGTTTG TGGTTTCCTG ATGATGGTCA 39400 
TGGCATTTCC CCCTTCACTC CATCTAAAAA TTGAGGTGAT ACAGGCTTTT 3 9450 
AAACAAAACC AACTCATATA GACTGAGTAC AACTGCAATG CAGGCATGCT 3 9500 
AACCTCTGCT ACAATCATGG GCGTGCTATT GATATGTCTT AAGTTACAGA 39550 
ACACAGGGCT GAGCGTCTCA TTAGGTCAAA ATGTAAACCA GTTTTTCTGC 39600 
TCACTGATGC TTAATGAGGA CAGGGTGTGA GAGATTTCTT TAAGGAAAAC 39650 
AAATATATAA TAATGCTACA TGGAAAAATA TCTAACATTA GAGAATTAAG 39700 
TAAATAAACT AATATACTCA CACCATGGAA TCTTGTGCAG ACATTAAAAT 39750 
TATGTAGTGG ATGGATGTTT AATGGTGTGA GAAAAAGTTA GGATGTGCTG 39800 
GGGTGGGGGG AAGAATCAAG TTTTAAGAAA ATACAGTATA CCCATACTTA 3 9850 
AGTAAAAAAA AAAAAAAAGG TATGTACAGT CATGTGTTGC TTAATGATGG 39900 
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GGATACATTC CGAGAAATGT GTCGATAGGT GATTTCATCC TrGTGTXSAAC 39950 
ATCATAGAGT GAACTTACAC AAACCTAGAT GGTCTAGCCT ACTATGTATC 40000 
TAGGCTATAT GACTAGCCTG TTGCTCCTAG GCTACAAACC TGTAAAGCAT 40050 
GTTACTGTAG CGAATATACA AATACTTAAC ACAATGGCAA GCTATCATTG 40100 
TGTTAAGTAG TTGTGTATCT AAACATATCT AAAACATAGA AAACTAATGT 40150 
GTTGTGCTAC AATGTTACAA TGACTATGAC ATTGCTAGGC AATAGGAATT 4 0200 
ATAATTTTAT CCTTTTATGG AACCACACTT ATATATGCGG TCCATGGTGG 40250 
ACCAAAACAT CCTTATGTGG CATATGACTG TATACATGTA CACAAAAAAT 40300 
AGATGAAAQA ATGAATATAC ATCAAAATAT TTAAAATGGT TATAATGACT 40350 
TAGGTTACTT TTATTTATCT TAGTAATAAT AATGATGATA GATAATACTT 40400 
TTATAGTGTT TACTATATAA AAGACACTGT TATAAGTGTT CTACATACTT 404 50 
TACATGTATT ACCTAAATGA TATAAATATA ACTCTGACAG TAACTAATCT 40500 
TATACGTTCT CTTTTCTTTT TTTTTTTTTT CTTTTTTTAG ACAGAATCTT 4 0550 
GCTCTACCAG GCTGGAGTGC AGGGTGCAAT CTCGGCTCAC TGCAACCTCC 40600 
GCCTCCCAGG TTCAAACGAT TCTCATGTCT CAGCCTCCTC AGTAGCTGGG 4 0650 
ACTACAGGCA CACACCACCA TGCCCX5GCTA ATTTTTGTAT TTTTGGGTAG 4 0700 
AGATGGAGTT TTGCCATGTT GGCCAGGCTG ATCTTGAACT CCTGGCCTCA 4 0 750 
AGTGATCTGC CTGCCTCAGC CTCCCAAAGT GCTGGGATTA CAGGTGTGAA 4 0800 
CCACTGTGCT CGGCCTAATC TTACAAGTTT TCAATATTTA AAGAGTGCTA 4 0850 
ACTTTGTTGA CAATATAAAA CATATTTGAG AAAAAGAGAT ATAAGCATCT 4 0900 
TATTTAGAAT TATGAAAATA TCAATAGACC TACAGCCGAC TAAAGCTTTT 4 0950 
CTTCATAAGC TCTTGCCTAT ATTGATTCGC TCCTGTGAAT ATGCATTAAT 41000 
TTGATTTAAA TAATAAGTAT GTATAAGAAA TAACACTTTT CCTTAATTTT 41050 
TAAGAACGTT CAACAGTTTT TAATTTGAAT TCCAATAGTG AAATACATAQ 41100 
AAAATATAAA ATTTTCTGTA GTTTAGCCAA ATTGTTTTTG TTTCACCACA 41150 
GCATTCTACC AAAATTTCTT . AATAACAGTA AGAAAATGAA TGCATACCTC 41200 
CTGCAGGGAG AGGGGAGTTA GGCAGTTTAT GGGCATAGTT ACAAQTGAGA 41250 
AATTTCATTG GCTACCATTT ACGCTAAATT CATAAAAACT GCATTCAATT 41300 
CTATATATCT ATTTTCTTTA CATAAAAAAG GTTTCAATTA TTGGCCATTA 41350 
AATAAAATAG CCACCATTCC AGAAGTTGTG TCATGTTTAT CCTTTTTATA 41400 
CCACCATCAT ATTGCCTATT ATATAGATTG TGTGTGTTCC ATTTTCTGTA 41450 
ATGGGCCAGA CAGTAAQTAT TTCTGGCTTT GGAGTCCATA TGGTCTCTAT 41500 
CATAACTACT CATCTCTGCC ATTGTAGCTT AAAGATTATC TAGGTCAAAT 41550 
GCCTAAGTGA TATAGTGTTO AAATACAAGT TATATAATAT AGGCTGCCAC 41600 
AAAAAAAAAT TTATTTGGTC TAAAAAAGAT TTCATGACTT TTGTAGCAGC 41650 
ATGGGT6GGG CATGCACCAC TTGGTTAACT CGGTGTATCT TTCTCCTTTG 41700 
CAGATCTGTC CAACTCAATG GTCTAACTCT AAAGATGGTG GATGATCAAA 41750 
CCTTGCCACC TTTAATGGAA AAACCTCTCC GGCCAGGAAG TTCACrGGGC 41800 
TTGCCAGCTT TCTCATATAG TTTTTTTGTG ATAAGAAATG CCAAAGTTGC 41850 
TQCTTGCATC TGAAAATAAA ATATACTAGT CCTGACACTG AATTTTTCAA 41900 
GTATACTAAG AGTAAAGCAA CTCAAGTTAT AGGAAAGGAA GCAGATACCT 41950 
TGCAAAGCAA CTAGTGGGTG CTTGAGAGAC ACTGGGACAC TGTCAGTGCT 42000 
AGATTTAGCA CAGTATTTTG ATCTCGCTAG GTAGAACACT GCTAATAATA 42050 
ATAGCTAATA ATACCTTGTT CCAAATACTG CTTAGCATTT TGCATGTTTT 42100 
ACTTTTATCT AAAGTTTTGT TTTGTTTTAT TATTTATTTA TTTATTTATT 42150 
TTGAGACAGA ATCTCTCTCT GTCACCCAGG CTGGAGTGCC ATGGTGCGAT 42200 
CTTGGCTCAC TGCAACTTTA AGCAATTCTC CTGCCTCAGC TTCCTGAGTA 42250 
GCTGGGATTA TAGGCGTGTG CCACCACGCC CAGCTACTTT CTATATTTTT 42300 
TGTAGAGATG 6AGTTTCGCC ATATTOGCCA AGCTGGTCTC GAACTCCTGT 42350 
CCTCGAACTC CT6TCCTCAA GTGATCCACC CGCCTCAGCC TCTCAAAGTG 424 00 
CTGGGATTAC AGGTGTGAGC CACCACACCC AGCAGTGTTT TATTTTTGAG 42450 
ACAGGGTATC ATTCTGTTGC CCAGGCTTGA GTGCAGTGGT GCAATCATAG 425 00 
ATCACTGCAG CCTTTTAACT CCTGGGCTCA AGTCATCCTC CTGCTTAGCC 4 2550 
TCCCAAGTAG CTAGGACCAC AGACACATGC CATCACACTT GGCTATTTTT 42600 
AAAAAATTTT TTGTAGAGAT GGGGTCTCGC TATGTTACCC AAACTGGTCC 42650 
TGAACTCCTG GACTCAATTG ATCCTCCCAC CTTGGCCTTC CAGGTGCTGG 42700 
GATTTCTTTG GGAGTACAGC ATGGTACAGC AGGAGATCAT TTGATGTTAC 42750 
CTCTGTGCAG TGTTGCTAGT CAGCGAAAGA CTATAATACC TGTGGGGACA 42800 
GCGATTAGCC ACCACAACCA GTCTTTATTT AAAGTTATTA AAAATGGCTG 42850 
GGCGCAGTGG CTCACACCTG TAATCCTAGC ACTTTGGGAG GCCGAOGCAG 42900 
ATGGATCACC TGACGTGAGG AATTTGAGAC CAGCCTGGCC AACATGGTGA 42950 
AACCCCATCT CTACTAAAAA ATACAAAAAT TAGCTGGGTG TGGTCCTGTA 43000 
GTCCCAGCTA CTTGGGAGGC TGGGGCAGGA GAATTACTTG AACCCAGGAG 43050 
GCAGAGGTTG CAGTGAGCCG AGATTGTGCC ACTGCACTCC AGCCTGGGTG 43100 
ACAGAGAGAG ATTCCATCTC AAAAAAACAA GTTATTAAAA ATGTATATGA 4 3150 
ATGCTCCTAA TATGGTCAGG AAGCAAGGAA GCGAAGGATA TATTATGAGT 43200 
TTTAAGAAGG TGCTTAGCTG TATATTTATC TTTCAAAATG TATTAGAAGA 43250 
TTTTAGAATT CTTTCCTTCA TGTGCCATCT CTACAGGCAC CCATCAQAAA 43300 
AAGCATACTG CCGTTACCGT GAAACTGGTT GTAAAAGAGA AACTATCTAT 433 5 0 
TTGC ACCTTA AAAGACAGCT AGATTTTGCT GATTTTCTTC TTTCGGTTTT 43400 
CTTTGTCAGC AATAATATGT GAGAGGACAG ATTGTTAGAT ATGATAGTAT 43450 
AAAAAATGGT TAATGACAAT TCAGAGGCGA GGAGATTCTG TAAACTTAAA 43500 
ATTACTATAA ATGAAATTGA TTTGTCAAGA GGATAAATTT TAGAAAACAC 4 3550 
CCAATACCTT ATAACTGTCT GTTAATGCTT GCTTTTTCTC TACCTTTCTT 43600 
CCTTGTTTCA GTTGGGAAGC TTTTGGCTGC AAGTAACAGA AACTCCTAAT 4 3650 
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TCAAATGGCT TAAGCAATAA GGAAATGTAT 
TCAAACAGGC CAGGCTCCAG CACTTCAGTA 
CTTCCCAGCT CTCTGCTCTG CCATCTTTAG 
TCTGGTAGCA TGATGGCTGT AGCTGTTTCA 
AGCAACCAGA GGAAGAAAAT GAGCCATTTT 
TGAATAACTC TTTTTCAGAG CTTCTCACAG 
CTCATGTCTT ATTGTTCAGA AATG6GTAAT 
TGCCAACAAC AACGAGGTTC CTATAATTGT 
TGGAGAGGGT GTTGGTCAGT CTACAAACTG 
TTTACCAGTG AAAAAATGTA ATTATTTTCC 
TTCAAATGTA TGCCTGTTAT GGATATAGTA 
AATAGCTTTA GGGGTACACA CTTTTTGCTT 
GGTOAAGACT CG6CTTTTAA TGTACTTGTC 
ACCCAATAGG TAATTTTTCA TCCATTACCC 
CTGAGTCTCC AACATCCCTT ATACCACTGT 
AGCTAAGCTT CCACTTATAA GTGAGAACAT 
CCTGAGTTAC TTCCCTTAGG ATAACAGCCC 
GCAAAATACA TTATTCTTCT TTATGGCTGA 
TATACCACAT TTTCTTTATC CACTTATCAG 
TTCCATTCAA TTTCATTCAA TTTAAGTATA 
AAAATTAAAT TTTAGATCTT TCAATACTCT 
TTTTTATATT TTCACATTTG AAATAAAGTA 
GTATGACTAT TCTTTTAGTA ATGTAAAGCC 
ACCACTAGTG TGTTGTTTCA CCCCTTGTTA 



30 

ATTCCCACAT AACTAGACGT 43700 

OGTCACCAGG GATCTGGGTT 43750 

CX3CTGGCTTC ATTCTCAGAC 43800 

TGGGCCCCTT CAAACCTCAT 4 3850 

TTGAGTCTCC TTCATAGACT 43900 

CAAACCTCTC CTCATGTCTC 43950 

GTGGCCATTT CACCAGTCAC 44000 

CTCTGAGTAA CCCTTTGQAA 44 050 

AACACTGCAG TTCTGCGCTT 44100 

CCTCTTAAGG ATTAATATTC 44150 

TCTTTAAAAT TTTTTATTTT 44200 

ACAGGGGTGA ATTGTGTAGT 44250 

ACCTGAGTGA TGTACATTGT 44300 

TCCTTCCGCC CTCTTCCCTT 44350 

GTATGTTCTT GTGTACCTAC 44400 

GCAGTATTTG GTTTTCCATT 44450 

CCAGTTCCGT CCAAGTTGCT 44500 

GTAATAGTCC ATGGTACATA 4 4550 

TTGATGGACA CTTAGGTTAA 44600 

TTTGTAAGGA GCTAAAGCTG 4 4650 

TAAATTTTAT ATGTAAGTGG 44700 

ATTTTTATAA CCTTGATATT 44750 

TACAGACTCC TACATTTGGA 44 800 

TACTATCAGG ATCCTCGA 44848 



(2) INFORMATION FOR SEQ ID NO: 43: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2396 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOIiOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
TTTCTAGTTG CTTTTAGCCA ATGTCGGATC AGQTTTTTCA AGCGACAAAG 50 
AGATACTGAG ATCCTGGGCA GAGGACATCC TAGCTCGGTC /VGATTTGGGC 100 
AGGCTCAAGT GACCAGTGTC TTAAGGCAGA AGGGAGTCGG GGTAGGGTCT 150 
GGCTGAACCC TCAACCGGGG CTTTTAACTC AGGGTCTAGT CCTGGCGCCA 200 

AATGGATGGG ACCTAGAAAA GGTGACAGAG TGCGCAGGAC ACCAGGAAGC 250 

TGGTCCCACC CCTGCGCGGC TCCCGGGCGC TCCCTCCCCA GGCCTCCGAG 300 

GATCTTGGAT TCTGGCCACC TCCGCACCCT TTGGATGGGT GTGGATGATT 350 

TCAAAAGTGG ACGTGACCGC GGCGGAGGGG AAAGCCAGCA CGGAAATGAA 400 

AGAGAGCGAG GAGGGGAGGG CGGGGAGGGG AGGGCGCTAG GGAGGGACTC 450 

CCGGGAGGGG TGGGAGGGAT GGAGCGCTGT GGGAGGGTAC TGAGTCCTGG 500 

CGCCAGAGGC GAAGCAGGAC CGGTTGCAGG GGGCTTGAGC CAGCGCGCCG 550 

GCTGCCCCAG CTCTCCCGGC AGCGGGC6GT CCAGCCAGGT GGGATGCTGA 600 

GGCTGCTGCT GCTGTGGCTC TGGGGGCCGC TCGGTGCCCT GGCCCAGGGC 650 

GCCCCCGCGG GGACCGCGCC GACCGACGAC GTGGTAGACT TGGAGTTTTA 700 

CACCAAGCGG CCGCTCCGAA GCGTGAGTCC CTCGTTCCTG TCCATCACCA 750 

TCGACGCCAG CCTGGCCACC GACCCGCGCT TCCTCACCTT CCTGGGCTCT 800 

CCAAGGCTCC GTGCTCTGGC TAGAGGCTTA TCTCCTGCAT ACTTGAGATT 850 

TGGCGGCACA AAGACTGACT TCCTTATTTT TGATCCGGAC AAGGAACCGA 900 

CTTCCGAAGA AAGAAGTTAC TGGAAATCTC AAGTCAACCA TGATATTTGC 950 

AGGTCTGAGC CGGTCTCTGC TGCGGTGTTG AGGAAACTCC AGGTGGAATG 1000 

GCCCTTCCAG GAGCTGTTGC TGCTCCGAGA GCAGTACCAA AAGGAGTTCA 1050 

AGAACAGCAC CTACTCAAGA AGCTCAGTGG ACATGCTCTA CAGTTTTGCC 1100 

AAGTGCTCGG GGTTAGACCT GATCTTTGGT CTAAATGCGT TACTACGAAC 1150 

CCCAGACTTA CGGTGGAACA GcTCCAACGC CCAGCTTCTC CTTGACTACT 1200 

GCTCTTCCAA GGGTTATAAC ATcTCCTGGG AACTGGGCAA TGAGCCCAAC 1250 

AGTTTCTGGA AGAAAGCTCA CATTCTCATC GATGGGTTGC AGTTAGGAGA 1300 

AGACTTTGTG GAGTTGCATA AACTTcTACA AAGGTCAGCT TTCCAAAATG 1350 

CAAAACTCTA TGGTCCTGAC ATCGGTCAGC CTCGAGGGAA GACAGTTAAA 1400 

CTGCTGAGGA GTTTCCTGAA GGCTGGCGGA GAAGTGATCG ACTCTCTTAC 14 50 

ATGGCATCAC TATTACTTGA ATGGACGCAT CGCTACCAAA GAAGATTTTC 1500 
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TGAGCTCTGA 


TGCGCTGGAC 


ACTTTTATTC 


31 

TCTCTGTGCA AAAAATTCTG 


1550 


AAGGTCACTA 


AAGAGATCAC 


ACCTGGCAAG 


AAGGTCTGGT 


TGGGAGAGAC 


1600 


GAGCTCAGCT 


TACGGTGGCG 


GTGCACCCTT 


GCTGTCCAAC 


ACCTTTGCAG 


1650 


CTGGCTTTAT 


GTGGCTGGAT 


AAATTGGGCC 


TGTCAGCCCA 


GATGGGCATA 


1700 


GAAGTCGTGA 


TGAGGCAGGT 


GTTCTTCGGA 


GCAGGCAACT 


ACCACTTAGT 


1750 


GGATGAAAAC 


TTTGAGCCTT 


TACCTGATTA 


CTGGCTCTCT 


CTTCTGTTCA 


1800 


AGAAACTGGT 


AGGTCCCAGG 


GTGTTACTGT 


CAAGAGTGAA 


AGGCCCAGAC 


1850 


AGGAGCAAAC 


TCCGAGTGTA 


TCTCCACTGC 


ACTAACGTCT 


ATCACCCACG 


1900 


ATATCAGGAA 


GGAGATCTAA 


CTCTGTATGT 


CCTGAACCTC 


CATAATGTCA 


1950 


CCAAGCACTT 


GAAGGTACCG 


CCTCCGTTGT 


TCAGGAAACC 


AGTGGATAC6 


2000 


TACCTTCTGA AGCCTTCGGG 


GCCGGATGGA 


TTACTTTCCA 


AATCTGTCCA 


2050 


ACTOAACGGT 


CAAATTCTGA 


AGATGGTGGA 


TGAGCAGACC 


CTGCCAGCTT 


2100 


TGACAGAAAA 


ACCTCTCCCC 


GCAGGAAGTG 


CACTAAGCCT 


GCCTGCCTTT 


2150 


TCCTATGGTT 


TTTTTOTCAT 


AAGAAATGCC 


AAAATCGCTG 


CTTGTATATG 


2200 


AAAATAAAAG 


GCATACGGTA 


CCCCTGAGAC 


AAAAGCCGAG 


GGGGGTGTTA 


2250 


TTCATAAAAC 


AAAACCCTAG 


TTTAGQAGGC 


CACCTCCTTG 


CCGAGTTCCA 


2300 


GAGCTTCGGG 


AGGGTGGGGT 


ACACTTCAGT 


ATTACATTCA 


GTGTGGTGTT 


2350 


CTCTCTAAGA 


AGAATACTGC 


AGGTGGTGAC 


AGTTAATAGC 


ACTGTG 


2396 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 535 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
Met Leu Arg Leu Leu Leu Leu Trp Leu Trp Gly Pro Leu Gly Ala 
5 10 IS 

Leu Ala Gin Gly Ala Pro Ala Gly Thx Ala Pro Thr Asp Asp Val 
20 25 30 

Val Asp Leu Glu Phe Tyr Thr Lys Arg Pro Leu Arg Ser Val Ser 
35 40 45 

Pro Ser Phe Leu Ser He Thr He Asp Ala Ser Leu Ala Thr Asp 
50 55 60 

Pro Arg Phe Leu Thr Phe Leu Gly Ser Pro Arg Leu Arg Ala Leu 
65 70 75 

Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys 
80 as 90 

Thr Asp Phe Leu He Phe Asp Pro Asp Lys Glu Pro Thr Ser Glu 
95 100 105 

Glu Arg Ser Tyr Trp Lys Ser Gin Val Asn His Asp He Cys Arg 
110 lis 120 

Ser Glu Pro Val Ser Ala Ala Val Leu Arg Lys Leu Gin Val Glu 
125 130 135 

Trp Pro Phe Gin Glu Leu Leu Leu Leu Arg Glu Gin Tyr Gin Lys 
140 145 150 

Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 
155 160 16S 

Tyr Ser Phe Ala Lys Cys Ser Gly Leu Asp Leu He Phe Gly Leu 
170 175 180 

Asn Ala Leu Leu Arg Thr Pro Asp h&u Arg Trp Asn Ser Ser Asn 
185 190 195 

Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn He 
200 205 210 

Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 
215 220 225 

His He Leu He Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 
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230 235 240 

Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 
245 250 255 

Tyr Gly Pro Asp lie Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 
.260 265 270 

Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val lie Asp Ser Leu 
275 280 285 

Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 
290 295 300 

Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe He Leu Ser Val 
305 310 315 

Gin Lys He Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 
320 325 330 

Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 
335 340 345 

Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
350 355 360 

Leu Gly Leu Ser Ala Gin Met Gly He Glu Val Val Met Arg Gin 
365 370 375 

Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 
390 385 390 

Glu Pro Leu Pro Asp Tyr Trp Leu Ser Leu Leu Phe Lys Lys Leu 
395 400 405 

Val Gly Pro Arg Val Leu Leu Ser Arg Val Lys Gly Pro Asp Arg 
410 415 420 

Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 
425 430 435 

Arg Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 
440 445 450 

Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 
455 460 465 

Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 
470 475 480 

Leu Ser Lys Ser Val Gin Leu Asn Gly Gin He Leu Lys Met Val 
485 490 495 

Asp Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 
500 505 510 

Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 
515 520 S25 

He Arg Asn Ala Lys He Ala Ala Cys He 
530 535 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2396 

(B) TYPE: nucleic acid 

(C) 5TRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 



























TT' 


TCT 


AGT 


8 


TGC 


TTT 


TAG 


CCA 


ATG 


TCG 


GAT 


CAG 


GTT 


TTT 


CAA 


GCG 


ACA 


AAG 


AGA 


53 


TAG 


TGA 


GAT 


CCT 


GGG 


CAG 


AGG 


ACA 


TCC 


TAG 


CTC 


GGT 


CAG 


ATT 


TGG 


98 


GCA 


GGC 


TCA 


AGT 


GAC 


CAG 


TGT 


CTT 


AAG 


GCA 


GAA 


GGG 


AGT 


CGG 


GGT 


143 


AGG 


GTC 


TGG 


CTG 


AAC 


CCT 


CAA 


CCG 


GGG 


CTT 


TTA 


ACT 


CAG 


GGT 


CTA 


188 


GTC 


CTG 


GCG 


CCA 


AAT 


GGA 


TGG 


GAC 


CTA 


GAA 


AAG 


GTG 


ACA 


GAG 


TGC 


233 


GCA 


GGA 


CAC 


CAG 


GAA 


GCT 


GGT 


CCC 


ACC 


CCT 


GCG 


CGG 


CTC 


CCG 


GGC 


278 
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GCT CCC TCC CCA GGC CTC CGA GGA TCT TGG ATT CTG GCC ACC TCC 323 

GCA CCC TTT GGA TGG GTG TGG ATG ATT TCA AAA GTG GAC GTG ACC 36 8 

GCG GCG GAG GGG AAA GCC AGC ACG GAA ATG AAA GAG AGC GAG GAG 413 

GGG AGG GCG GGG AGG GGA GGG CGC TAG GGA GGG ACT CCC GGG AGG 458 

GGT GGG AGG GAT GGA GCG CTG TGG GAG GGT ACT GAG TCC TGG CGC 503 

CAG AGG CGA AGC AGG ACC GGT TGC AGG GGG CTT GAG CCA GCG CGC 548 

CGG CTG CCC CAG CTC TCC CGG CAG CGG GCG GTC CAG CCA GGT GGG 593 

ATG CTG AGG CTG CTG CTG CTG TGG CTC TGG GGG CCG CTC GGT GCC 638 
Met Leu Arg Leu Leu Leu Leu Trp Leu Trp Gly Pro Leu Gly Ala 
5 10 15 

CTG GCC CAG GGC GCC CCC GCG GGG ACC GCG CCG ACC GAC GAC GTG 683 
Leu Ala Gin Gly Ala Pro Ala Gly Thr Ala Pro Thr Asp Asp Val 
20 25 30 

GTA GAC TTG GAG TTT TAC ACC AAG CGG CCG CTC CGA AGC GTG AGT 728 
Val Asp Leu Glu Phe Tyr Thr Lys Arg Pro Leu Arg Ser Val Ser 
35 40 45 

CCC TCG TTC CTG TCC ATC ACC ATC GAC GCC AGC CTG GCC ACC GAC 773 
Pro ser phe Leu Ser lie Thr lie Asp Ala Ser Leu Ala Thr Asp 
50 55 60 

CCG CGC TTC CTC ACC TTC CTG GGC TCT CCA AGG CTC CGT GCT CTG 818 
Pro Arg Phe Leu Thr Phe Leu Gly Ser Pro Arg Leu Arg Ala Leu 
65 70 75 

GCT AGA GGC TTA TCT CCT GCA TAC TTG AGA TTT GGC GGC ACA AAG 863 
Ala Arg Gly Leu Ser Pro Ala Tyr Leu Arg Phe Gly Gly Thr Lys 
BO 85 90 

ACT GAC TTC CTT ATT TTT GAT CCG GAC AAG GAA CCG ACT TCC GAA 908 
Thr Aap Phe Leu lie Phe Asp Pro Asp Lys Glu Pro Thr Ser Glu 
95 100 105 

GAA AGA AGT TAC TGG AAA TCT CAA GTC AAC CAT GAT ATT TGC AGG 953 
Glu Arg Ser Tyr Trp Lya Ser Gin Val Asn His Asp lie Cys Arg 
110 115 120 

TCT GAG CCG GTC TCT GCT GCG GTG TTG AGG AAA CTC CAG GTG GAA 998 
Ser Glu Pro Val Ser Ala Ala Val Leu Arg Lys Leu Gin Val Glu 
125 130 135 

TGG CCC TTC CAG GAG CTG TTG CTG CTC CGA GAG CAG TAC CAA AAG 1043 
Trp Pro Phe Gin Glu Leu Leu Leu Leu Arg Glu Gin Tyr Gin Lye 
140 145 150 

GAG TTC AAG AAC AGC ACC TAC TCA AGA AGC TCA GTG GAC ATG CTC 1088 
Glu Phe Lys Asn Ser Thr Tyr Ser Arg Ser Ser Val Asp Met Leu 
155 160 165 

TAC AGT TTT GCC AAG TGC TCG GGG TTA GAC CTG ATC TTT GGT CTA 1133 
Tyr Ser phe Ala Lys Cys Ser Gly Leu Asp Leu lie Phe Gly Leu 
170 175 180 

AAT GCG TTA CTA CGA ACC CCA GAC TTA CGG TGG AAC AGc TCC AAC 1178 
Asn Ala Leu Leu Arg Thr Pro Asp Leu Arg Trp Asn Ser Ser Asn 
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185 190 195 

GCC CAG CTT CTC CTT GAC TAG TGC TCT TCC AAG GGT TAT AAC ATc 1223 
Ala Gin Leu Leu Leu Asp Tyr Cys Ser Ser Lys Gly Tyr Asn lie 
200 205 210 

TCC TGG GAA CTG GGC AAT GAG CCC AAC AGT TTC TGG AAG AAA GCT 1268 
Ser Trp Glu Leu Gly Asn Glu Pro Asn Ser Phe Trp Lys Lys Ala 
215 220 225 

CAC ATT CTC ATC GAT GGG TTG CAG TTA GGA GAA GAC TTT GTG GAG 1313 
His lie Leu lie Asp Gly Leu Gin Leu Gly Glu Asp Phe Val Glu 
230 235 240 

TTG CAT AAA CTT cTA CAA AGG TCA GCT TTC CAA AAT GCA AAA CTC 1358 
Leu His Lys Leu Leu Gin Arg Ser Ala Phe Gin Asn Ala Lys Leu 
245 250 255 

TAT GGT CCT GAC ATC GGT CAG CCT CGA GGG AAG ACA GTT AAA CTG 1403 
Tyr Gly Pro Asp lie Gly Gin Pro Arg Gly Lys Thr Val Lys Leu 
260 265 270 

CTG AGG AGT TTC CTG AAG GCT GGC GGA GAA GTG ATC GAC TCT CTT 1448 
Leu Arg Ser Phe Leu Lys Ala Gly Gly Glu Val lie Asp Ser Leu 
275 280 285 

ACA TGG CAT CAC TAT TAG TTG AAT GGA CGC ATC GCT ACC AAA GAA 14 93 
Thr Trp His His Tyr Tyr Leu Asn Gly Arg He Ala Thr Lys Glu 
290 295 300 

GAT TTT CTG AGC TCT GAT GCG CTG GAC ACT TTT ATT CTC TCT GTG 153 8 
Asp Phe Leu Ser Ser Asp Ala Leu Asp Thr Phe He Leu Ser Val 
305 310 315 

CAA AAA ATT CTG AAG GTC ACT AAA GAG ATC ACA CCT GGC AAG AAG 1583 
Gin Lys He Leu Lys Val Thr Lys Glu He Thr Pro Gly Lys Lys 
320 325 330 

GTC TGG TTG GGA GAG ACG AGC TCA GCT TAC GGT GGC GGT GCA CCC 1628 
Val Trp Leu Gly Glu Thr Ser Ser Ala Tyr Gly Gly Gly Ala Pro 
335 340 345 

TTG CTG TCC AAC ACC TTT GCA GCT GGC TTT ATG TGG CTG GAT AAA 1673 
Leu Leu Ser Asn Thr Phe Ala Ala Gly Phe Met Trp Leu Asp Lys 
350 355 360 

TTG GGC CTG TCA GCC CAG ATG GGC ATA GAA GTC GTG ATG AGG CAG 1718 
Leu Gly Leu Ser Ala Gin Met Gly He Glu Val Val Met Arg Gin 
365 370 375 

GTG TTC TTC GGA GCA GGC AAC TAC CAC TTA GTG GAT GAA AAC TTT 1763 
Val Phe Phe Gly Ala Gly Asn Tyr His Leu Val Asp Glu Asn Phe 
380 385 390 

GAG CCT TTA CCT GAT TAC TGG CTC TCT CTT CTG TTC AAG AAA CTG 1808 
Glu Pro Leu Pro Asp Tyr Tjrp Leu Ser Leu Leu Phe Lys Lys Leu 
395 400 405 
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GTA GGT CCC AGG GTG TTA CTG TCA AGA GTG AAA GGC CCA GAC AGG 1853 
Val Gly Pro Arg Val hen Leu Ser Arg Val Lys Gly Pro Asp Arg 
410 415 

AGC AAA CTC CGA GTG TAT CTC CAC TGC ACT AAC GTC TAT CAC CCA 1898 
Ser Lys Leu Arg Val Tyr Leu His Cys Thr Asn Val Tyr His Pro 
425 430 435 

CGA TAT CAG GAA GGA GAT CTA ACT CTG TAT GTC CTG AAC CTC CAT 194 3 
Arg Tyr Gin Glu Gly Asp Leu Thr Leu Tyr Val Leu Asn Leu His 
440 445 450 

AAT GTC ACC AAG CAC TTG AAG GTA CCG CCT CCG TTG TTC AGG AAA 1988 
Asn Val Thr Lys His Leu Lys Val Pro Pro Pro Leu Phe Arg Lys 
455 460 465 

CCA GTG GAT ACG TAC CTT CTG AAG CCT TCG GGG CCG GAT GGA TTA 2033 
Pro Val Asp Thr Tyr Leu Leu Lys Pro Ser Gly Pro Asp Gly Leu 
470 475 480 

CTT TCC AAA TCT GTC CAA CTG AAC GGT CAA ATT CTG AAG ATG GTG 2078 
Leu Ser Lys Ser Val Gin Leu Asn Gly Gin He Leu Lys Met Val 
485 490 495 

GAT GAG CAG ACC CTG CCA GCT TTG ACA GAA AAA CCT CTC CCC GCA 2123 
Asp Glu Gin Thr Leu Pro Ala Leu Thr Glu Lys Pro Leu Pro Ala 
500 505 510 

GGA ACT GCA CTA AGC CTG CCT GCC TTT TCC TAT GGT TTT TTT GTC 2168 
Gly Ser Ala Leu Ser Leu Pro Ala Phe Ser Tyr Gly Phe Phe Val 
515 520 525 

ATA AGA AAT GCC AAA ATC GCT GCT TGT ATA TGA AAA TAA AAG GCA 2213 
lie Arg Asn Ala Lys He Ala Ala Cys He 
530 535 

TAC GGT ACC CCT GAG ACA AAA GCC GAG GGG GGT GTT ATT CAT AAA 2258 

ACA AAA CCC TAG TTT AGG AGG CCA CCT CCT TGC CGA GTT CCA GAG 2303 

CTT CGG GAG GOT GGG GTA CAC TTC AGT ATT ACA TTC AGT GTG GTG 234 8 

TTC TCT CTA AGA AGA ATA CTG CAG GTG GTG ACA GTT AAT AGC ACT 2393 
GTG 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 385 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46 
CGGCCGCTGC TGCTGCTGTG GCTCTGGGGG CGGCTCCGTG CCCTGACCCA 50 
AGGCACTCCG GCGGGGACCG CGCCGACCAA AGACGTGGTG GACTTGGAGT 100 
TTTACACCAA GAGGCTATTC CAAAGCGTGA GTCCCTCGTT CCTGTCCATC 150 
ACCATCGACG CCAGTCTGGC CACCGACCCT CGGTTCCTCA CCTTCCTGAG 200 
CTCTCCACGG CTTCGAGCCC TGTCTAGAGG CTTATCTCCT GCGTACTTGA 250 
GATTTGGCGG CACCAAGACT GACTTCCTTA TTTTTGATCC CAACAACGAA 300 
CCCACCTCTG AAGAAAGAAG TTACTGGCAA TCTCAAGACA ACAATGATAT 350 
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TTGCGGGTCT GACCGGGTCT CCGCTGACGT GTTGA 38! 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 541 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : doxible 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47 

AAATCAGGAC ATATCCTTCA CTTATTTGCC TCTTGGTCAT ATTGGAGGCA 50 

TTTGTATTCA TTTTTAATAA CCCTCAAAAT AGTGCATGCA AAGTGCTAAG 100 

CGTCATTTGC CACATGGTGC CATTAACTGT CACCACCTGC AGTGGTCTAC 150 

TTAGAGAACA CCGCACTGGA TGTTAACACT GAAGCGCGTG CCCCGCCCTC 200 

CC6AGGCTCT GGATCCAQCG TTGAAGCTTG CCCCGCCCTC CCGAGGCTCT 250 

GGATCCAGCA CTGGAGCATG CCCCGCCCTC CCGAGGCTCT GGAGCTTGCT 300 

AAGGAGTCCG CTCCCTACCG CTGGGGTTTT GCTTTATTCT TATQAATGAC 350 

ACCCCTGACC GCTTTCGTCT CAGGGGTACT GTAATGCCTT TTATTTTCAT 400 

ATACAAGCTG CGATTTTGGC ATTTCTTATG ACAAAAAACC CATAGGAAAA 450 

GGCGGGCACG CTTAGTGAGC TTCCTGCGGG GAGAGGTTTT TCTGTTAGAG 500 

CTGGCANGGT CTGCTCATCG ACCATCTTCA GGCCTCGTGC C 541 
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